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Preface 



It was our great pleasure to hold the 2nd International Symposium on Automated Tech- 
nology on Verification and Analysis (ATVA) in Taipei, Taiwan, ROC, October 31- 
November 3, 2004. The series of ATVA meetings is intended for the promotion of related 
research in eastern Asia. In the last decade, automated technology on verification has 
become the new strength in industry and brought forward various hot research activities 
in both Europe and USA. In comparison, eastern Asia has been quiet in the forum. With 
more and more 1C design houses moving from Silicon Valley to eastern Asia, we believe 
this is a good time to start cultivating related research activities in the region. 

The emphasis of the ATVA workshop series is on various mechanical and informative 
techniques, which can give engineers valuable feedback to fast converge their designs 
according to the specifications. The scope of interest contains the following research ar- 
eas: model-checking theory, theorem-proving theory, state-space reduction techniques, 
languages in automated verification, parametric analysis, optimization, formal perfor- 
mance analysis, real-time systems, embedded systems, infinite- state systems, Petri nets, 
UML, synthesis, tools, and practice in industry. 

As a young symposium, ATVA 2004 succeeded in attracting 69 submissions from 
all over the world. All submissions were rigorously reviewed by three reviewers and 
discussed by the PC members through the network. The final program included a general 
symposium and three special tracks: (1) Design of secure/high-reliability networks, 
(2) HW/SW coverihcation and cosynthesis, and (3) hardware verification. The general 
symposium consisted of 24 regular papers and 8 short papers. The three special tracks 
together accepted 7 papers. The hnal program also included three keynote speeches by 
Bob Kurshan, Rajeev Alur, and Pei-Hsin Ho; and three invited speeches by Jean-Pierre 
Jouannaud, Tevfik Bultan, and Shaoying Liu. The symposium was also preceded by 
three tutorials by Bob Kurshan, Rajeev Alur, and Pei-Hsin Ho. 

We want to thank the National Science Council, Ministry of Education, and 
Academia Sinica of Taiwan, ROC. Without their support, ATVA 2004 would not have 
come to reality. We thank the Department of Electrical Engineering, Center for In- 
formation and Electronics Technologies (CIET), SOC Center, and Graduate Institute of 
Electronic Engineering (GIEE) of National Taiwan University for their sturdy 
support, and we thank Synopsys, Inc. for sponsoring ATVA 2004. We thank all the 
tutorial-keynote speakers, invited speakers, committee members, and reviewers of ATVA 
2004. Einally, we thank Mr. Rong-Shiung Wu, for his help in maintaining the webpages 
and compiling the proceedings, and Mr. Lin-Zan Cai, for his help in all the paperwork. 
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Games for Formal Design and Verification of 
Reactive Systems 



Rajeev Alur 

University of Pennsylvania, USA 



Abstract. With recent advances in algorithms for state-space traver- 
sal and in techniques for automatic abstraction of source code, model 
checking has emerged as a key tool for analyzing and debugging software 
systems. This talk discusses the role of games in modeling and analysis 
of software systems. Games are useful in modeling open systems where 
the distinction among the choices controlled by different components is 
made explicit. We first describe the model checker Mocha that supports 
a game-based temporal logic for writing requirements, and its applica- 
tions to analysis of multi-party security protocols. Then, we describe 
how to automatically extract dynamic interfaces for Java classes using 
predicate abstraction for extracting a boolean model from a class file, 
and learning algorithms for constructing the most general strategy for 
invoking the methods of the model. We discuss an implementation in the 
tool JIST — Java Interface Synthesis Tool, and demonstrate that the tool 
can construct interfaces, accurately and efficiently, for sample Java2SDK 
library classes. 
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Evolution of Model Checking into the EDA Industry 



Robert R Kurshan 
Cadence Design Systems, USA 

Today, the Electronic Design Automation (EDA) industry is making its second attempt 
to commercialize model checking tools for hardware verihcation. Its first attempt started 
about 6 years ago, in 1998. While this first attempt was only barely successful commer- 
cially, it resonated well enough with customers of the model checking tool vendors to 
motivate a second round of commercial offerings in 2004. 

Why has it taken almost a quarter century for model checking to surface in a com- 
mercial venue? Why did not the great academic tools of the ’80s and ’90s translate more 
directly into useful commercial tools? 

In retrospect, there are three clear answers to these questions: 

1. application of model checking in commercial flows requires a significant change 
in design methodology, advancing verification from post-development to the early 
design stage, onto the shoulders of developers; historically, developers have been 
considered “too valuable" to burden with testing (“verification" and “test" are not 
distinguished in EDA); 

2. a commercial-quality tool is expensive to deploy, requiring verihcation experts to 
program the core algorithms with close attention to performance, and beyond this re- 
quiring signihcant efforts in developing use models, the integrated tool architecture, 
user interfaces, documentation, product quality validation (product testing), market- 
ing, customer support and - very critically - convincing the Sales Team (who work 
on commission and/or bonuses based on sales volume) that they should put in a lot 
of effort to learn and sell a new tool when they already have an established customer 
base for the commoditized tools like simulators that they can sell in million dollar 
batches with a phone call; 

3. given 1., the market for these tools was hard to estimate, so it was hard or impossible 
to calculate the expected return on investment of the daunting costs in 2. 

Nonetheless, in the ‘90s, the growing inadequacy of existing functional verification 
methods was reaching crisis proportions on account of the inability of simulation test 
to keep up with exponentially growing design complexity. There was growing pressure 
on the EDA industry to provide better support for weeding out of circuit designs an in- 
creasing number of disastrous functional bugs, before those designs hit the marketplace. 

Previously, proof-of-concept demonstrations of the value of formal verihcation, at 
least in the hands of experts, had become ever more persuasive, with many demonstration 
projects in academia and industry that showed the potential of formal verihcation to solve 
the looming test crisis. 

Around 1998, the EDA industry responded timidly to these pressures by releasing 
under- funded and thus inadequate answers to these industry needs. Lack of funding 
resulted in short-changing one or more of the requirements cited in 2. above, and/or 
failing to adequately address the issue 1. The result was a lot of sparks of interest, 
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even some flames of satisfaction from well-positioned users, but no broadly sustainable 
verification product that could be fanned out widely in EDA. 

However, the sparks and flames did catch the attention of EDA management enough 
to fund a second round. The first focus of the second round was to evaluate the failures 
of the first round and devise solutions for these failures. 

The first issue to address was 1 . Designers are widely supported like “prima donnas", 
whereas “product verification" is often considered to be an entry level job from which 
one seeks to advance to more glamorous work like design. Therefore, to ask the designer 
to support verification was largely considered by management to be a non-starter. 

Since the classical test flow consisted of handing the completed design together 
with a system specification to a testing group, it was natural at some level for manage- 
ment to presume that by analogy they should hand off a completed design to a model 
checking team. Since there was little available expertise in model checking, they looked 
to academia for this expertise. In the first model checking flows, newly hired formal 
verification Ph.d’s augmented with summer students served as the first model checking 
teams. 

The trouble with this setup was that whereas classical test teams could infer system 
tests from a system specification with which they were provided, the intrinsic computa- 
tional capacity limitations on model checking required that model checking be applied at 
the design block level. There are generally no design specifications for individual blocks 
beyond rough engineering notes that are rarely up to date and often hard to understand. 

These model checking teams were thus forced to spend an inordinate amount of 
time studying block designs in order to come up with suitable properties to check. 
Often, this process included quizzing the designers, which some designers resented as 
an unwelcome intrusion on their time, or else management feared that it would be that. 
Moreover, it was insufficient to learn only the blocks to be tested. It was also required 
to learn the environment of those blocks, in order to design an “environment model" or 
constraints for the blocks to be verified, in order to preclude false failures. Getting the 
environment model right was often the hardest part of the process, as it required learning 
a large part of the design, far beyond the portion to be checked. 

In summary, using a dedicated model checking team was not a solution that would 
scale to a general widely deployed practice. The team was too far from the design to be 
able to easily understand it as required, and extracting the required information from the 
designers was considered too disruptive to the designers. 

Eor the second round the clear priorities, in order, were these: 

1 . FIRST, focus on USABILITY to break into the current development flow; 

2. then, focus on capacity: how to scale the use model to the same size designs to which 
simulation test applies; 

3. finally, focus on PERFORMANCE in order to get results fast enough to augment 
and keep up with the normal test flow. 

The syllogism went like this: formal verification must be applied to design blocks, 
on account of capacity limitations; but, only the designer understands a design at the 
granularity of its blocks; therefore, it must be the designer who facilitates formal verifi- 
cation. 
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On the one hand, advancing verification in the development flow to the earlier design 
phase offered a hig potential advantage. It could reduce development costs hy finding 
bugs earlier, thereby saving more costly fixes later. But this left unanswered how break 
through the cultural barrier: designers dont do test! 

The answer to this puzzle came through the evolution of assertion languages. An 
assertion language is a formal language that is used to specify properties to be checked 
in a design. The most common assertion languages (although they were not called that) 
were the logics LTL and CTL, in use in academia for two decades. These were hard to 
assimilate (even for the experts) and only meek attempts were made to introduce them to 
designers. In 1995, along with one of the first commercial model checkers, FormalCheck 
from Lucent Technologies, came a very simple and intuitive assertion language: the 
FormalCheck Query Language (FQL). FQL was strictly more expressive than LTL, 
being able to express any w-regular language. It was expressed through templates like 

After(e) Always(/) Unless((i) 

After(e) Eventually((i) 

where e, / and d are Boolean expressions in design variables and the template expressions 
imply universal quantification over design states. FQL made it harder to write complex 
logic expressions, but simpler and more transparent to write simple common expressions. 
By conjuncting such simple templates, any w-regular property could be expressed. 

IBM also saw a need to make the assertion language more accessible to designers, 
but took another approach. They implemented a textual version of CTL in their model 
checker RuleBase. Intel, Motorola and others also found solutions to the the problem 
of making assertion languages more palatable to designers, in some cases by greatly 
restricting expressiveness. One example in this direction was Verplex’s OVL, a template- 
based assertion language like FQL, but significantly less expressive although possibly 
even simpler to understand. 

Designers were encouraged to use an assertion language to write “comments" that de- 
scribed the correct behavior of their blocks. This was not the same as asking the designer 
to participate in testing (verification) - it was only asking the designer to document pre- 
cisely the functional requirements of a block’s design. While some designers also shy 
away from commenting their code, requiring a designer to write a precise functional 
specification of a design is something that development managers have long thought to 
be Important, and now with a good concrete justification (to facilitate better verification), 
managers bought into this requirement on their designers. 

With assertions in place, the plan was to use a verification team to check the assertions. 
This strategy became know as Assertion-Based Verification. Since the designers wrote the 
assertions, there was no need for the verification team to understand the design. Moreover, 
with assertions in every block, there was no need for the verification team to write 
an environment model: the assertions from adjacent blocks served as the environment 
model. 

There was one practical problem with this approach. Managers felt uneasy to in- 
vest considerable resources in a proprietary assertion language: what if the tools that 
supported that assertion language were not the best tools? Design managers wanted to 
evaluate the various options in the market place and then select the best one. But to 
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evaluate a tool, one needed assertions written in the language supported by that tool and 
that required committing to that language. This dilemma resulted in paralysis in many 
companies, and some reluctance to try any of the available tools. 

Some people in the EDA industry saw through this problem and saw that its resolution 
lay with a standardized assertion language. In fact, standardization was more important 
than the language itself. The result (leaving out the tumult of corporate politics that 
intervened into the process) was the PSL assertion language released as an international 
standard in 2003 by the Accellera standardization body. 

With PSL declared as a standard, EDA vendors rushed to support it in their tools. 
This widespread support in turn gave the design managers the confidence to buy into 
assertion-based verification; they could have their designers write PSL assertions and 
then evaluate a spectrum of verification tools from various vendors that support PSL. If 
one vendor’s tool was deemed insufficient, then the next vendor’s tool could be applied 
to the same specification, without need to rewrite the assertions. 

Linally, the means to break into the design development flow had materialized. 
Nonetheless, resistance to change was yet a barrier. Managers needed confidence that 
a significant methodology shift would really pay off. Mature managers know that what 
looks good on paper always carries hidden pit-falls that can sink an otherwise advan- 
tageous transition. Thus, their reluctance to change is not mere Luddism but sound 
common sense. 

To accommodate this concern, technology transfer needs to be accomplished through 
a succession of small steps, each of which is barely disruptive to the status quo, but 
nonetheless shows some measurable benefit. The lack of disruption is necessary to 
overcome the reasons for resistance to the new technology. The measurable benefit 
is needed to justify the next small step. 

Hence, the evolution of model checking must follow an adoption sequence of small 
but beneficial steps, focused on ease of use. Each vendor has found its own adoption 
sequence, but there is a common theme: start with what can be done fully automatically, 
even without any user-defined assertions. Examples here include checks that variables 
are assigned within their range, no divide-by-zero, the absence of bus collisions and 
combinational cycles, and checks that every logical branch condition is enabled. Given 
success with this utility, which has close to zero user cost and gives some benefit, the 
next “small step" could be “user-assisted" automatic checks. These could include buffer 
overflow and underflow, one-hotness, mutual exclusion in arbitration and the like. All the 
user needs to do is to designate a structure as a “buffer", “one-hot register" or “arbiter". 
Bolstered with a successful return on investment for these “pre-defined checks", the 
design manager is likely ready to commit to full “user-defined checks" with assertion- 
based verification. Assertions have the useful feature that they can be used equally for 
simulation and model checking. This provides another point of reassurance that the new 
methodology will not end in a rat hole. 

The next small step in the evolutionary adoption sequence is to limit assertions to 
“local" properties of the design: properties that can be verified within a small number 
of blocks, using the assertions of adjacent blocks as the environment. This can be in- 
strumented to run automatically as “automatic assume-guarantee reasoning". Limiting 
to local properties handles the capacity issue. 
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Performance is addressed by a technically exciting succession of recent advances in 
model checking technology, using SAT solving, ATPG and BDD-based model checking 
in conjunction with one another. Other techniques have been developed to reduce the 
likelihood that errors found in the design are artifacts of the localization of the verification 
process to blocks. These techniques incorporate simulation test and model checking. 

The future looks bright, with the evolution of model checking looking clearer and 
stronger. The next small steps in the adoption sequence move the process up the abstrac- 
tion hierarch to more global design properties. 




Abstraction Refinement 



Pei-Hsin Ho 

Synopsys, USA 



Abstract. Formal property verification exhaustively verifies logic de- 
signs against some desired properties of the designs with respect to all 
possible input sequences of any length. Without abstraction, state-of-the- 
art formal proof engines nsually cannot verify properties of designs with 
more than a couple of hundred registers. As a result, formal property ver- 
ification relies on automatic abstraction techniques to verify real-world 
logic designs. 

Abstraction refinement, first introduced by Kurshan in the tool 
COSPAN, is one of the most practical automatic abstraction methods. 
Abstraction refinement incrementally refines an abstract model of the 
design by including more and more detail from the original design until 
the underlying formal property verification engine verifies or falsifies the 
property. 

In this talk we give an overview to some of the most interesting abstrac- 
tion refinement techniques that have enabled the formal verification of 
VLSI logic designs with millions of logic gates. 
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Tools for Automated Verification of Web 

Services 



Tevfik Bultan, Xiang Fu, and Jianwen Su 

Department of Computer Science, University of California 
Santa Barbara, CA 93106, USA; {bultan, fuxiang,su}@cs. ucsb.edu 



1 Introduction 

Web-based software applications which enable user interaction through web 
browsers have been extremely successful. Nowadays one can look for and buy al- 
most anything online using such applications, from a book to a car. A promising 
extension to this framework is the area of web services, web-accessible software 
applications which interact with each other using the Web. Web services have 
the potential to have a big impact on business-to-business applications similar to 
the impact interactive web software had on business-to-consumer applications. 

There are various issues that have to be address in order to develop successful 
web services: a) services implemented using different platforms (such as .NET 
or J2EE) should be able to interact with each other; b) it should be possible to 
modify an existing service without modifying other services that interact with 
it; c) services should be able to tolerate pauses in availability of other services 
and slow data transmission. Web services address these challenges by the follow- 
ing common characteristics: 1) standardizing data transmission via XML [13], 
2) loosely coupling interacting services through standardized interfaces, and 3) 
supporting asynchronous communication. Through the use of these technologies, 
Web services provide a framework for decoupling the interfaces of Web accessible 
applications from their implementations, making it possible for the underlying 
applications to interoperate and integrate into larger, composite services. 

A fundamental question in developing reliable web services is the modeling 
and analysis of their interactions. In the last couple of years, we developed a 
formal model for interactions of composite web services, developed techniques 
for analysis of such interactions, and built a tool implementing these techniques 
[3, 4, 5, 6, 7, 8]. Below we give a brief summary of these contributions. 

2 Conversations 

Our work focuses on composite web services which interact with asynchronous 
messages. We call each individual web service a peer. A composite web service 
consists of a set of peers which interact with each other using asynchronous 
messages [3]. Such a system can be modeled as a set of state machines which 
communicate using unbounded FIFO message queues, as in the communicating 
finite state machine model [2]. When a message is sent, it is inserted to the end 
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of the receiver’s message queue. In our model, we assume that each peer has a 
single queue for incoming messages and receives the messages in the order they 
are inserted to the queue. 

In order to analyze interactions of asynchronously communicating web ser- 
vices we first need a formal model for their interactions. We model the interac- 
tions in such a system as a conversation, the global sequence of messages that 
are exchanged among the web services [3,9,11]. Note that, a conversation does 
not specify when the receive events occur, it only specifies the global ordering 
of the send events. 

Given a composite web service, one interesting problem is to check if its 
conversations satisfy an LTL property. Due to asynchronous communication via 
unbounded FIFO queues this problem is undecidable [4]. 

3 Realizability and Synchronizability 

A composite web service can be specified in either top-down or bottom-up fash- 
ion. In the top-down approach the desired conversation set of the composite 
web service is specified as a conversation protocol. In the bottom-up approach 
each peer is specified individually and a composite web service is specified by 
combining a set of asynchronously communicating peers. For both approaches, 
our goal is to verify LTL properties of the set of conversations generated by the 
composition. 

There are two interesting properties within this framework: realizability and 
synchronizability. A conversation protocol is realizable if the corresponding con- 
versation set can be generated by a set of asynchronously communicating web 
services. On the other hand, a set of asynchronously communicating web services 
are synchronizable if their conversation set does not change when asynchronous 
communication is replaced with synchronous communication. We developed suf- 
ficient conditions for realizability and synchronizability that can be checked au- 
tomatically [4,5,7]. 

Using the realizability analysis, reliable web services can be developed in a 
top-down fashion as follows: 1) A conversation protocol is specified and checked 
for realizability; 2) The properties of the conversation protocol are verified using 
model checking; 3) The peer implementations are synthesized from the conver- 
sation protocol via projection. 

Similarly, synchronizability analysis enables development of reliable web ser- 
vices in a bottom-up fashion. If a web service composition is synchronizable, we 
can verify its behavior without any input queues and the verification results will 
hold for the asynchronous communication semantics (with unbounded queues). 



4 Web Services Analysis Tool 

We developed a tool which implements the techniques mentioned above. Web 
Service Analysis Tool (WSAT) [8,12] verifies LTL properties of conversations 




10 



T. Bultan, X. Fu, and J. Su 



and checks sufficient conditions for realizability and synchronizability. In order 
to model XML data, WSAT uses a guarded automata model where the guards 
of the transitions are written as XPath [14] expressions. This guarded automata 
model provides a convenient intermediate representation. The front end of the 
WSAT translates web services specified in BPEL [1] to this intermediate rep- 
resentation [5]. WSAT uses the explicit-state model checker SPIN [10] for LTL 
model checking by translating the guarded automata model to Promela [6]. In 
the future, we plan to investigate symbolic analysis and verification of web ser- 
vices. 



Acknowledgments. Authors are supported by NSF Career award CCR- 
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1 Introduction 

Verification is a hard task, but much progress has been achieved recently. Many 
verification problems have been shown decidable by reducing them to model- 
checking finite state transition systems. Verification of infinite state transition 
systems has achieved tremendous progress too, by showing that many particular 
cases were themselves decidable, such as timed automata [1] or some forms of 
pushdown-automata [4]. However, the demand for verification is growing fast, 
and the industrial needs go far beyond the verification of decidable systems. 

Verification of large, complex systems for which the task is actually unde- 
cidable is therefore an issue that must be addressed carefully. There are two 
main requirements. The first is generality : any system should be amenable to a 
user-assisted treatment. The second is automaticity : decidable systems should 
be processed without user-assistance. 

There are two main approaches to the verification of complex systems. 

The first is based on abstraction techniques. The system is first simplified 
by finding a suitable abstraction making verification decidable. This approach 
requires finding the abstraction first, which can be done by using a toolkit of 
possible abstractions, or by delegating the problem to the user. In the latter 
case, the abstraction must be proved correct, while the correctness follows from 
general theorems in the first case 

The second is based on theorem proving techniques. The system is first de- 
scribed by using some appropriate language, and the description compiled into 
some logical formula. The property to be verified is then itself described by a 
formula of the same language. It is finally checked with the theorem prover. 

Both approaches are compatible: abstraction techniques need theorem 
provers to verify their correctness, and theorem provers need abstraction tech- 
niques to ease their proving task. The main difference is therefore in the empha- 
sis. In the following, we concentrate on the second approach. 

* This work was partly supported by the RNTL project AVERROES. 

** Project Logical, Pole Commun de Recherche en Informatique du plateau de Saclay, 
CNRS, Ecole Polytechnique, INRIA, Universite Paris-Sud. 
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2 Theorem Provers 

Any theorem prover may be used for a verification task. The questions are : how 
easy that is ? and, what confidence can we have in the result ? 

Let us first address the second, classical question. Proving a logical statement 
is not enough, since a yes/no answer does not carry any information about its 
actual proof. Modern theorem provers return a formal proof in case the logical 
statement is true. The proof can then be checked using a proof-checker which is 
normally part of the theorem prover. This has important design consequences: 
a theorem prover should come in two parts, a proof-search mecanism and a 
proof-checker. If the logic has been appropriately designed, proof-checking is 
decidable. The confidence in the result of the theorem prover therefore reduces 
to the confidence in the proof-checker, which is normally a small, easily readable 
piece of code which can itself be proved^ . The proof-search mecanism is normally 
quite complex, since it must contain general mecanisms as well a specific ones 
for automating the search for a proof for as many as possible decidable cases^. 

The first question is more complex. The requirement here is to ease the work 
of the user. As already said, the user must first specify the problem, which is a 
question of programming language. Here, we face the usual questions of program- 
ming languages: should it be a general logical language, or a specific, problem 
oriented one ? What are the appropriate logical primitives ? What are the struc- 
turing mecanism provided by the language ? Second, the user must specify the 
property to be proved, a normally easy task. And third, the user should help 
the system in finding a proof when an automatic search is not possible: deciding 
which tactics have to be applied, in which order, is his responsibility. 



3 Proposal 

We strongly believe that a sensible theorem prover should be based on a sim- 
ple logical framework whose metatheoretical properties have been thoroughly 
investigated, ensuring in particular that the logic is consistent (relatively to 
some formulation of set-theory) and that proof-checking is decidable. This logi- 
cal framework should provide with the appropriate primitives making the tasks 
of specification and proof-search easy. In particular, induction is a requirement. 

3.1 The Logic 

Since we need to construct proof-terms, the logic should be based on the Curry- 
Howard principle. So does higher-order intuitionistic logic, but there is no dogma 
here. More specificaly, we shall use the calculus of modular inductive construc- 
tions of Chrzaszcz[5] on which the current version V.8 of the proof-assistant Coq 
is based. The calculus of inductive constructions is due to Coquand and Paulin. 

^ Typically, its size ranges from a few hundred to a few thousands lines of code. 

^ Being a large collection of so-called proof-tactics, its size can range from several ten 
thousands to several hundred thousands lines of code. 
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3.2 The Specification Language 

The calculus of modular inductive constructions is a very powerful functionnal 
language, and a very powerful type system with polymorphic, dependent and 
inductive types. This makes the specification of arbitrary problems at least as 
easy as in any pure, strongly typed functional programming language. Besides, 
the module system with parameterized functors eases structuring the specifica- 
tion, specifying the implementation of abstract structures, using parameterized 
libraries, and reusing it’s own abstract code when appropriate. Note that ab- 
stract functors may contain universal statements that can later be instantiated. 

Using such a calculus requires some expertise, as does the use of a high-level 
programming language. Engineers may not want learning all the subtleties of 
that language for using the system. It may therefore be important to provide with 
an input language fitting their needs. In Coq, this input language is provided 
as a library of Coq modules (in the previous sense) implementing some form 
of timed automata. This library, called Calife, comes with a graphic interface 
allowing the specification of transition systems directly on the screen^. Writing 
such a library is a hard task, of course, since it implies proving a lot of basic 
properties of the Coq implementation of the input language. 

3.3 Proof-Search 

In Coq, proof-search is done via an imperative language of proof-tactics. Au- 
tomating this task would be difficult, although it could be done in a PROLOG- 
like manner by using a semi-decision procedure for higher-order unification. De- 
pending on the user specification, proof-search could therefore be non terminat- 
ing. Restricting the format of the user specifications (as it can be done with an 
input language) could avoid the problem, but this issue has not been investigated 
for Coq which favours the user-interaction schema. 

3.4 Incorporating Decision Procedures 

In calculi with dependent types, computations can be made transparent in proofs 
by identifying two propositions which differ by computationally equivalent sub- 
terms only via the conversion rule: if p is a proof of Q, and P, Q are convertible, 
then p is a proof of P. Proofs become easier to read, and of a manageable size. 

In Coq V8.1, convertibility is based upon functional application and a gen- 
eralisation of higher-order primitive recursion of higher type. In a prototype 
version of Coq developped by Blanqui, convertibility incorporates user-defined 
higher-order rewrite rules'^. In another developped by Strub, convertibility in- 
corporates Shostak’s mecanism for combining decision procedures, as well as a 
few simple decision procedures satisfying the required properties [3] . With these 
new, recent developments, proofs are much easier to develop by allowing to use 
such a proof-assistant as a programming language for verification. 

® Calife is a trademark of the project AVERROES, see http://calife.criltechnology.com 
^ These rules must of course satisfy poperties that ensure consistency of the logic on 
the one hand, and decidability of type-checking on the other hand [2]. 
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3.5 Using Coq for Verification 

There has been many experiments of using Coq for verifying complex systems: 
Various telecommunications or security protocols have been analysed, some 
of which by industrial terns (France-Telecom, Trusted Logics, Schlumberger) 
with the help of the Calife interface, such as ABR, PGM and PIM. 

Imparative software written in a powerful input language including higher- 
order functions, polymorphism, references, arrays and exceptions can be verified 
with Why, a verification conditions generator front-end for Coq. 

Security properties of large sofware systems have been proved with Kraka- 
toa (for JAVA) and Caduceus (for C), which are both front-ends for Why that 
translate their input into Why's language (See http://why.lri.fr/index.en.html). 

4 Future Developments 

Because the conversion rule may include computations of an arbitrary complexity 
(full higher-order arithmetic can be encoded in Coq), it becomes important to 
compile reduction rather than to interpret it. Gonthier and Werner faced this 
problem with the proof of the four colour theorem, for which they realized that 
checking all particular cases (by computation, and so the proof term reduces to 
a single use of reflexivity of equality) would take for ever. A prototype version 
of Coq is provided with a compiler for computations, which makes the use of 
conversion feasible for proofs involving large computations. It remains to extend 
the compiler in order to incoporate the recent extensions of conversion. 

For secure applications, it is important to extract the code from the proof 
rather than to prove an a priori given code. Coq is provided with a proof extrac- 
tion mecanism. Because of smart-cards applications, the code may be ressource 
sensitive. We are currently developping a version of extraction in which both the 
code and its complexity (in time and space) will be extracted from the proof. 

Some properties of systems may not be true, but may be true with probality 
one, a problem we had to face with the protocol PGM. We are again developping 
an extension of Coq in order to prove probabilistic properties of deterministic 
protocols, and also prove properties of random algorithms. 
Acknowledgements. To the past and current members of the LogiCal project. 
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1 Introduction 

There is a growing interest in adopting formal specifications in software develop- 
ment [1][2], but meanwhile the trend is also constrained by the lack of effective, 
practical techniques for validating and verifying formal specifications [3]. An 
obvious advantage of formal specifications over informal specifications is that 
the formal specifications can be rigorously analyzed with tool support to ensure 
their consistency and validity. There are several analysis techniques proposed by 
researchers in the literature, such as formal proof [4], specification animation [5], 
model checking [6], specification testing [7], and rigorous reviews [8], but among 
those techniques, the most commonly used one in practice that emphasizes the 
human role is rigorous reviews [9]. 

Review is a traditional technique for static analysis of software to detect 
faults possibly resulting in the violation of the consistency and validity of soft- 
ware systems. Basically, software review means to check through software either 
by a team or an individual. Since software means both program and its related 
documentation, such as functional specification, abstract design, and detailed de- 
sign, a review can be conducted for all levels of documentation [10] [11] [12] [13]. 
Compared to formal proof, review is less rigorous, but emphasizes the impor- 
tance of human’s role in verification and validation. Since software development 
is really a human activity involving many important human judgements and de- 
cisions, review is an effective technique for software verification and validation 
in practical software engineering [14] [9]. Furthermore, when dealing with non- 
terminating software and/or non-executable formal specifications (e.g., VDM, Z, 
SOFT), review has obvious advantages over the techniques that require execu- 
tion of software systems (e.g., testing). The question is how to improve the rigor 
and effectiveness of review techniques. 

Ensuring the consistency of formal specifications is one of the most desirable 
goals to achieve in software development with formal methods. In this paper 
we describe an automated rigorous review method for verifying and validating 

* This paper is an extended abstract of my talk at ATVA2004. The work is supported 
by the Ministry of Education, Culture, Sports, Science, and Technology of Japan 
under Grant-in- Aid for Scientific Research on Priority Areas (No. 16016279). 
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formal specifications written in SOFL (Structured Object-oriented Formal Lan- 
guage) [15]. The essential idea of the method is to automatically derive all the 
critical properties of a specification first and then review the specification to 
ensure that it satisfies all the properties. The review method includes four steps: 
(1) deriving critical properties as review targets on the basis of the structure 
of the specification, (2) building a review task tree to present all the necessary 
review tasks for each property, (3) performing reviews based on the review task 
tree, and (4) evaluating the review results to determine whether faults are de- 
tected or not. The critical properties offers what to check in the specification; 
the review task tree (RTT) provides a notation to represent all the review tasks; 
performing reviews based on the review task tree allows the reviewer to focus 
on each individual task each time; and the evaluation of the review results is 
carried out automatically on the basis of the tree logic and the review results 
of all the atomic tasks on the tree. In order to support all the activities of the 
method, we built a prototype software tool that can guide a reviewer to apply 
the method to review a specific specification. 

2 Critical Properties 

Since data and operations are two important aspects of a specification, the review 
of the specification should focus on these two aspects. As mentioned previously, 
our approach is to derive all the critical properties from the specification first 
and then to review the specification to ensure that it satisfies the properties. For 
this reason, the critical properties should reflect the constraints imposed by both 
the user and the semantics of the SOFL specification language. The important 
critical properties for a model-oriented formal specification include: 

— Internal consistency 

— Invariant-conformance consistency 

— Satisfiability 

— Integration consistency 

The internal consistency ensures that the use of expressions does not vio- 
late the related syntactic and semantic constraints imposed by the specification 
language. Invariant-conformance consistency deals with the consistency between 
operations and the related type and/or variable invariants. Satisfiability is a 
property requiring that satisfactory output be generated based on input by an 
operation under its precondition. Integration consistency ensures that the inte- 
gration of operations to form a more powerful operation keeps the consistency 
among their interfaces. The formal definitions of these properties are described 
in our previous publication [16]. 

3 Brief Introduction to RTT 

An RTT is a systematic graphical notation for representing review tasks for a 
property described above. It is designed by simplifying the fault tree notation 
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traditionally used for safety analysis of safety-critical systems [17]. Each node of 
a review task tree represents a review task, defining what to do about a property, 
and it may be connected to “child nodes” in different ways, depending on the 
type of the node. There are two kinds of review tasks. One is “the property 
involved holds” and another is “the property involved can hold”. The former 
is represented by a rectangle node, while the latter is represented by a round- 
edged rectangle node. A simple example of RTT is given in Figure 1 to illustrate 
the general idea of an RTT. The top-level task is can_hold(A) and it is ensured 
by two sub-tasks: hold(R) and hold(C'). The task hold(R) is ensured if one of 
the sub-tasks hold(Z?) and hold(if) is performed correctly. 




Fig. 1. A simple example of RTT 



An RTT of a property is constructed based on its structure. Since a property 
is expressed as a predicate expression, the problem of how to derive an RTT from 
the property becomes the issue of how to transform the predicate expression to 
an RTT that provides more effective structure and information to the reviewer 
for performing reviews. For the purpose of transformation, the following strategy 
can be applied: 

— For a compound property (predicate expression), review its constituent pred- 
icates first and then its combination (if necessary). 

— For an atomic predicate (a relation or its negation), review whether the set 
of values constrained by the predicate is empty or not. Such a set must be 
given in form of set comprehension so that the types of the involved variables 
in the predicate can be clearly indicated, which is helpful to the reviewer. 

The general rules for drawing review task trees from logical expressions are 
given in our previous work [16]. It is worth noticing that an RTT for each 
property does not only show graphically the same logic of the property, but 
indicates what to review in order to ensure that the property can hold or hold, 
depending on the type of the task required. For example, to ensure the task 
can_hold(A A B) (meaning that A A B is satisfiable), the corresponding RTT 
shows that we must first review whether A and B can hold individually, and 
then review whether their conjunction (common part) can hold. 
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4 A Prototype Tool 



The proposed review method is rigorous, but in the meanwhile it also requires 
great attention and skills from reviewers during a review process. Without a 
powerful tool support, the method seems difficult to be applied to large scale 
systems. To attack this problem, we built a prototype tool to support almost 
every step of the four steps suggested in the method. Although the tool is still 
a prototype, it shows the great possibility of building a more powerful and com- 
prehensive software tool in the future to support the entire process of the review 
method. For now the tool offers the following major services: 

— Automatically making a property list on the GUI of the tool based on a 
property file. For each property on the list, its number, content, and the 
status of being reviewed are clearly shown to indicate the review progress. 

— Automatically generating an RTT for a property on the list on the basis of 
the RTT generation rules. 

— Ensuring that an RTT is well-drawn on the drawing window when it is 
automatically generated from a property. The RTT can also be edited to 
ensure desired properties. 

— Automatically guiding the review process by highlighting a review task each 
time and provide the final review report to indicate whether there are faults 
in the property. 

— Automatic hyperlinking among the properties on the property list, review 
tasks in the RTT, and the properties in the property file resulting from 
the formal specification under review. This function considerably facilitates 
the reviewer to quickly find the information about the involved variables, 
expressions, and their roles in the specification of a process in order to make 
accurate decisions in the review process. 



5 Conclusion 

The rigorous review method described in this paper offers a systematic approach 
to supporting the review of formal specifications. A prototype tool for the method 
is built to support the entire review process. Our research can benefit both the 
conventional review methods by introducing rigor to them and formal methods 
by demonstrating the usefulness of formal specifications to the improvement of 
software design quality. 

We will continue our on-going work on the construction of the tool for the 
method, and will conduct large-scale case studies to help the improvement of 
both the review method and the tool in the future. 



Acknowledgement. We would like to thank all the students who contributed 
to the implementation of the prototype tool. 
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Abstract. The large number of program variables in a software 
verification model often makes model checkers ineffective. Since the 
performance of BDD’s is very sensitive to the number of variables, 
BDD-based model checking is deficient in this regard. SAT-based model 
checking shows some promise because the performance of SAT-solvers is 
less dependent on the number of variables. As a result, SAT-based tech- 
niques often outperform BDD-based techniques in discrete systems with 
a lot of variables. Timed systems, however, have not been as thoroughly 
investigated as discrete systems. The performance of SAT-based model 
checking in analyzing timing behavior - an essential task for verifying 
real-time systems - is not so clear. Moreover, although SAT-based 
model checking may be useful in bug hunting, their capability in proving 
properties has often been criticized. To address these issues, we propose 
a new bounded model checker, xBMC, to solve the reachability problem 
of dense-time systems. In xBMC, regions and transition relations are 
represented as Boolean formulae via discrete interpretations. To support 
both property refutation and verification, a complete inductive algo- 
rithm is deployed, in addition to the requirement of reaching an intrinsic 
threshold, i.e. the number of regions. In an experiment to verify the 
client authentication protocol of Cornell Single Sign-on systems, xBMC 
outperforms the efficient model checker, RED [35], even if no bugs exist. 
We believe that xBMC may provide an effective and practical method 
for timing behavior verification of large systems. 

Keywords: Induction, Verification, Model checking, Region automata, 
Real-time systems, BMC. 



1 Introduction 

Successful model checking for software verification mandates the ability to han- 
dle a large number of program variables. Because the size of binary decision 
diagrams (BDDs) may grow rapidly as the number of variables increases, soft- 
ware verification remains a difficult problem for conventional BDD-based model 
checkers. On the other hand, satisfiability (SAT) solvers are less sensitive to the 
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number of variables and SAT-based model checking, i.e. bounded model check- 
ing (BMC), is showing some promise in this regard [9] [13]. As Nierbert et al. [30] 
suggested, BMC has benefits in terms of bug hunting, especially for systems too 
large for complete verification, even though it is less efficient in guaranteeing the 
correctness of software systems. 

A recent comparison [5] of the two techniques shows that BDD-based model 
checkers require more space, but SAT-based model checkers require more time. 
As a result of numerous proposals for improving the efficiency of SAT solvers 
[23] [26], the performance of BMC’s has improved automatically. Consequently, 
BMC has recently gained acceptance in the research community, especially for 
software verification [14] [18]. However, for the analysis of timing behavior, which 
is considered essential for verifying embedded systems, protocol implementations 
and many other types of software, the advantages of SAT-based BMC are less 
clear. A fundamental problem with BMC is its lack of support for timing be- 
havior modeling. We have addressed this issue in [38], where we applied BMC 
techniques to region automata and encoded the implicit simulation of a region- 
based state exploration algorithm as Boolean formulae. In that project we not 
only characterized regions as combinations of discrete interpretations, but also 
precisely encoded the settings of these interpretations as Boolean formulae. We 
proved that the satisfiability of these Boolean formulae is equivalent to solving 
the forward reachability problem of dense-time systems, within steps bounded 
by the number of regions. 

Although SAT-based verification techniques are very useful in bug hunting, 
their capability in proving properties has often been criticized. An inductive 
method offers SAT-based verification an opportunity to prove safety properties 
efficiently. The basic idea, just like mathematical induction, is to prove the safety 
property for all steps by assuming the properties of the previous steps. Previous 
research has been devoted to this issue [10] [15] [28] [33], but none of it supports 
timing behavior. 

In this paper, we extend the above research to timed systems. By applying a 
loop-free inductive method to BMC, we implement xBMC for inductive reacha- 
bility analysis of region automata. When the inductive method is effective, it can 
verify the given safety property within a handful of steps, regardless of the diam- 
eter of the reachability graph. Compared to research on the encoding of timing 
behavior [7] [25] [29] [31] [31], discretization in [38] allows us to deploy the induc- 
tive method rather straightforwardly. Compared to conventional model checkers 
[11] [22] [35], we provide a more effective and practical method to alleviate state 
explosion, especially for those systems too large to verify. 

Our experiments verify the correctness of the client authentication protocol 
of Cornell Single Sign-on systems(CorSSO) [19]. The experimental results show 
that xBMC is more efficient than RED [35] for correctness guarantee, as well as 
for bug hunting. 

The rest of this paper is organized as follows. In Section 2 we briefly de- 
scribe timed automata having both discrete and clock variables. In Section 3 
we describe our previous effort that encodes the behavior of region automata as 
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Boolean formulae. An inductive algorithm is given in Section 4. An inductive 
reachability analysis is given in Section 5, and experimental results are summa- 
rized in Section 6. After discussing related works in Section 7, we present our 
conclusions in Section 8. 



2 Timed Automata 

A timed automaton (TA) [1][2][4][37] is an automaton with a finite set of clock 
variables. Its behavior consists of a) alternating discrete transitions that are 
constrained by guarded conditions on discrete and clock variables and b) time 
passage in which the automaton remains in one state, while clock values increase 
at a uniform rate. For clarification of discretization purposes, we use a TA that 
contains both discrete and clock variables, rather than one that models the 
discrete parts as locations. 



2.1 Constraint and Interpretations 

For a set D of discrete variables and a set X of clock variables, set <P{D,X) of 
both constraints tp is defined by: tp := ff\d = q\x<\c\->p\pi\/ p 2 , where d € D and 
q G dom (d), x G X, < G {<, =, <}, and c G IM is a non-negative integer. Typical 
short forms are:tt = -'ff, pi Ap 2 = ~< ((“'T’l) V {~<p 2 )) and pi ^ p 2 = ~'Pi V p 2 - 
A discrete interpretation s assigns each discrete variable a non-negative integer 
that represents one value from its predefined domain, i.e. s : D i-A IN. A clock 
interpretation ly assigns a non-negative real value to each clock, i.e. ly : X i-A IR"'". 
We say that an interpretation pair (s,iy) satisfies constraint p if and only if p is 
evaluated as true, according to the values given by (s,iy). 

2.2 Timed Automata 

A TA is a tuple of {D, X, A, I, E), where: 

1. D is a finite set of discrete variables, with each d G D having a predefined 
finite domain denoted by dom (d), 

2. A is a finite set of clock variables, 

3. A is an action set with each t G A consisting of a finite series of discrete 
variable assignments, 

4. / specifies an initial condition, and 

5. E C <P{D,X) X A X 2^ is a finite set of edges. An edge e : {p,T,X) G E 
represents the transition consisting of:i^ G <P (D, A) as a triggering condition 
which specifies when the transition can be fired, t G A as the action that 
changes the current discrete interpretation into the next one, and A C A as 
the set of reset clocks. 

For an action , s [t] denotes the discrete interpretation after applying r G A to s. 
For S G IR'’’, iy+6 denotes the clock interpretation that maps each clock x to the 
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value V (x) +(5. For \ C X, [A] denotes the clock interpretation that assigns 0 to 
each X G X and agrees with v over the rest of the clocks. The essence of a TA is 
a transition system (Q,— >■), where Q is the set of states and — >■ is the transition 
relation. A state of a TA is a pair (s, v) such that s is a discrete interpretation 
of D and is a clock interpretation of X. We say (s, v) is an initial state, where 
s maps discrete variables to values that satisfy I and v {x) = 0 for all a; G X. 

6 e 

There are two types of — >■, i.e. — >■ and — >■: 

1. For a state (s, n) and an increment S G IR'*’, (s, n) -T (s,n + S). 

2. For a state (s, u) and an edge e : ((/?, r, A) such that (s, v) satisfies (p, (s, v) -T 

A run r : (sq, vq) -G (si, vi) —>■••• of a TA is an infinite sequence of states and 
transitions, where for all i G IN, (si, Vi) G Q. An arbitrary interleaving of the two 
transition types is permissible. A state (s', n') is reachable from (s, n) if it belongs 
to a run starting at (s, n). Let Run (s, v) denote the set of runs starting at (s, n). 
We define Reach (s, v) : {(s', v') |3r G Run (s, v) and i G IN, (s^, Vi) = (s', n')} as 
the set of states reachable from (s, v). 

3 Boolean Encoding of Region Automata 

System states change as time progresses, but some changed states are not 
distinguishable by constraints. Based on this observation, Alur et al. [2] de- 
fined the equivalence of clock interpretations and proposed region graphs for 
the verification of timed automata. We represent the set of clock assignments 
of an equivalence class as {vd,Vj), a pair of discrete interpretations mapping 
integral parts and fraction orderings of clock assignments respectively. Given 
an equivalence class [n], integral parts of the clock assignments stand for the 
discrete interpretation i>d in (1), which maps each clock x G X, assuming 
V (x) = t and t = [tj -I- frac{t), into an integer representing an interval from 

{pj 1 1) ) [^ A] ) ■ ■ ■ ) (Ca: 1) C-x) , \Cx, Cx\ , (Ca;, Oo)}. 



( 2[t\ , if [t\ <Cx A frac (t) = 0 
Vd (a^) = < 2 [t\ + 1, if [t\ < Cx A frac (t) yi 0 (1) 

I 2cx + 1, otherwise 



Given a discrete interpretation Vd, we define Inv {nd) ■ |a;|3fc G IN, A: < Cx, 

Vd {x) = 2fc-|- 1} as the set of clocks having non-zero fractions. Then, the discrete 
interpretation in (2) maps each clock pair (x,y), where x,y G Inv(j^d) and 
X < y, into a relation from Note that stands for the fraction 

ordering of an equivalence class [v\ . 



A, if frac{v{x)) < frac{v{y)) 
= if frac{n{x)) > frac{v{y)) 
frac{v{x)) = frac{v{y)) 



(x, y) 



(2) 
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It’s clear that {vd, exactly represents [id], while Vd and v-y are defined in (1) 
and (2) respectively. For example, an equivalence class (1 <x<j/< 2)A2;=1 
is represented by the pair {vd,v-f), where Vd{x) = 3 A Vdiy) = 3 A idd{z) = 
2Au^ {x, y) =<. Accordingly, a region, (s, can be precisely represented as an 
interpretation state, (s, Vd, v^), where three discrete interpretations s : D i— IM, 
: A I— >■ IM and Vy : Inv (vd) x Inv (nd) i-A {^, Y, «} are involved. 

A TA’s transition system then can be represented by a finite discrete inter- 
pretation graph where Qsi is the set of interpretation states and ^ is 

the interpretation transition relation. There are two types of -A, i.e. -A and -A: 

1. Given a state (s, nd, Vy), (s, Vd, Vy) ^ (s', Vy) if and only if iptime is eval- 
uated as true, according to the values given by (^s,Vd,Vy, s' . 

2. Given a state {s,Vdi^y) and an edge e : {(p,T,X), {s,Vdi^y) ^ 

if and only if tptran is evaluated as true, according to the values given by 
{s,vd,vy,s',v'^,v'^). 

'4’time defines the successor relation formula for capturing a region moving 
into a subsequent region due to time passage, while iptran defines the discrete 
transition formula for triggering an edge using discrete interpretations. Then the 
transition relation formula T is iptran V iptran- In [38], we proposed formal defini- 
tions and encoding methods of these formulae, and proved that these encodings 
represent the exact behavior of region automata. 

Our state variables B are given in (3), in which a set of Boolean variables 
is used to encode interpretation states. Given each discrete variable’s domain 
and each clock’s largest constraint value, the number of state variables, i.e. \B\, 
equals Rg jdom (d)|] -|- [^g (2Ca; + 2)] -I- |A| |A — 1|. To perform BMG, we 

add a copy of B to the set of state variables at each iteration. 



{bdjd e D,0< k< [lg|dom(d)|]} U 

A,0<fc< [lg(2c, + 2)l} U (3) 

{bxy^, \x,y € X,x <y,k € {0, 1}} 

Finally, we build a circuit representation to translate a bit-vector logic (used 
to build the equation for a concrete transition relation) into conjunctive normal 
form (GNF). 

4 Induction 

Although SAT-based model checking is very useful for bug hunting [9] [13] [38], its 
ability to prove properties is often criticized. The inductive method offers SAT- 
based model checking an opportunity to prove safety properties efficiently. The 
basic idea, like mathematical induction, is to prove the safety property for all 
steps by assuming the property on previous steps. Here, we briefly illustrate the 
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technique. Interested readers are referred to [10] [15] [28] [33] for further discussion. 
Let qo & Q where qo is the initial state and P{») a predicate over states in Q. 
We would like to prove that for any state q reachable from qo, P{q) holds by 
induction. Firstly, we verify if P{qo) holds. If not, an error is reported for qo. 
If P{qo) holds, we check whether P{q) A (g — >■ q') A ~^P{q') can be satisfied, i.e. 
whether it is possible to reach a state q' that does not satisfy P{*) from any state 
q satisfying P{»). If it is impossible, we know that P{») must hold for any state 
in Reach(go)- To see this, recall that P{qo) holds. We argue that all successors 
of qo must satisfy P{»). If not, P{q) A (g — >■ q') A -•P{q') must be satisfiable for 
some successor q' , which is a contradiction. Similarly, we can prove all states 
reachable from qo in two steps must satisfy P{*), and so on. Hence we conclude 
all reachable states must satisfy P{»). 

In the example, the depth of the inductive step is one. We call it the simple 
induction. However, the capability of simple induction is very limited. Just like 
mathematical induction, it may be necessary to assume several previous steps in 
order to prove the property. In the literature, induction with arbitrary depths has 
been proposed. Unfortunately, the inductive technique cannot prove all safety 
properties, even with arbitrary depths. In [10] [15] [28] [33], various mechanisms 
are proposed to make induction complete. Here, we use loop-free induction. In 
loop-free induction, additional constraints are applied to prevent loops. Consider 
a self-loop transition, followed by a state that does not satisfy P{»). The prob- 
lematic state can always be reached by an inductive step of arbitrary depth. It 
suffices to consider a short path leading to the problematic state and still prove 
the soundness and completeness of the induction. By requiring all previous states 
to be distinct, loop-free induction eliminates unsubstantial paths. Based on the 
discretization scheme in Section 3, we can deploy loop-free induction to speed 
up the verification of safety properties. In Figure 1, which shows the flow of the 
inductive method, Bi represents the state variables of step and P{B) is true 
when the valuation of B satisfies the predicate P{»). 



Induction(P, MaxBound) 
var i:0. .MaxBound; 
begin 
i:=0; 

loop forever 

if i^SAT{{P{Bo) A (Ho ^ Hi) A ... A H(H,) A (H, ^ H,+i)A 
-■H(Hi+i)) A A Bj))) return “success”; 

if (i=MaxBound) return “fail within MaxBound steps” ; 
i:=i-|-l; 

end. 



Fig. 1. Flow of Inductive Method 
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The flow tries to establish the loop-free inductive step within a given bound. 
The inductive step essentially checks whether it is possible to reach any bad state 
following several safe states. If it is impossible, we stop; otherwise, the length is 
increased and the algorithm repeats the loop. 

5 Reachability Analysis 

In this section, we describe how we deal with the reachability problem by solving 
the satisfiability problem of an expanding Boolean formula iteratively. Moreover, 
we show how to apply loop-free induction in BMC efficiently. 



5.1 Bounded Reachability Analysis 

Given an initial condition, a risk condition, a transition condition and an inte- 
ger bound, we solve the bounded reachability problem by iteratively calling the 
SAT solver. We unfold the interpretation transition relation until the SAT solver 
returns a truth assignment, or reaches the bound. Let I{Bi) and R{Bi) respec- 
tively denote the CNF formulae of the given initial and risk conditions over Bi. 
The implementation of BoundedFwdReachO is given in Figure 2. By conjoining 
the formula with the negation clause of the risk condition, each intermediate 
result is saved for use in later iterations to speed up the decision procedure of 
the SAT-solver. In the next section, we also show that this conjunction can be 
used directly to apply the inductive method. 

If the risk state is reachable, the formula will be satisfied at the step, and 
a truth assignment will be returned by the SAT solver. The procedure will then 
terminate and generate a counterexample. The formula will keep on expanding 
if a risk state is not reached. Therefore, if the risk state is unreachable, the pro- 
cedure terminates when either MaxBound is reached, or memory is exhausted. 
Given a TA having nregions, the final formula will contain 



BoundedFwdReach(I, R, MaxBound) 
var i:0. .MaxBound; 
begin 

i:=0; F:=I(Ro); 

loop forever 

if (SAT(FA R(i?i))) return “reachable”; 

if (i=MaxBound) return “unreachable within MaxBound steps”; 

F:=FA“' R(Ri)A(i?i — >■ 

i:=i-|-l; 

end. 



Fig. 2. Bounded Forward Reachability Analysis 
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5.2 Toward Unbounded Reachability Analysis 

Since the number of regions is exponential to the number of clocks, as well 
as each clock’s largest constant, the threshold is usually prohibitively high. In 
IndFwdReachO, we combine loop-free induction() with BoundFwdReach() and 
obtain a complete inductive algorithm for forward reachability analysis. Note 
that, in Fig. 3, we denote the released formula as F/I, i.e. removing the clauses 
of I from F. Two extra checks for the loop-free requirement and the induction are 
used to help determine completeness in early steps. If the loop-free requirement 
is not satisfied, which means no new states can be reached in the next step, the 
procedure can terminate and the risk state is unreachable. For the induction, we 
regard P{») as i.e. the negation of the risk condition. Once the induction 

succeeds, we can conclude that all reachable states must satisfy which 

implies that the risk state is unreachable. 

One limitation of this inductive method is that we can not predict in which 
step it can terminate with “success” returned. Although regions are finite and 
we can guarantee success when MaxBound exceeds the number of regions, in 
the worst case, the inductive method may not determine termination ahead, 
but only induce overhead. However, when the inductive method is effective, it 
can verify the given safety property within a handful of steps, regardless of the 
diameter of the reachability graph. In Section 6, we conduct an experiment to 
show the effectiveness of induction. 

IndFwdReach(I, R, MaxBound) 
var i:0.. MaxBound; 

begin 

i:=0; F:=I(Ro); 

loop forever 

if (-'SAT(F)) return “unreachable by loop-free”; 

else if (SAT(FAR(i?i))) return “reachable”; 

else if (-'SAT((F/I)AR(Ri))) return “unreachable by induction”; 

else if (i=MaxBound) return “unreachable within MaxBound steps”; 

else 

F:=FA-i R{Bi)A{B^ -A B^+i)A{/\j^- B^ ^ Bj); 

end. 



Fig. 3. Inductive Forward Reachability Analysis 



6 Experiment 

6.1 Cornell Single Sign-On 

Cornell Single Sign-On (CorSSO) [19] is a distributed service for network au- 
thentication. It delegates client identity checking to multiple servers by threshold 
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1. p = 0, p : ={l ,2} ;a: =0 ;reset {x,y} ; 

2. p 7 ^ 0 A x>TA, a:=a+l;reset{x,y}; 

3. y<TE A ((p=l A a>THi) V (p=2 A a>TH 2 )) 

4. p : =0; 



Fig. 4. Each client in the CorSSO protocol has two discrete variables and two clocks: p 
recording the chosen sub-policy, a for the number of collected authentications, x nsed 
to constraint the time spent collecting a certificate, and y used to constraint the expired 
time of the anthorized sub-policy. There is also an auxiliary discrete variable to record 
locations. Initially, all processes are in Authentication with all variable values equaling 
zero. 



cryptography. In CorSSO, there are three kinds of principles, namely, authenti- 
cation servers, application servers and clients. For a client to access the services 
provided by an application server, he has to identify himself by the authen- 
tication policy specified by the application server. The authentication policy 
consists of a list of sub-policies each specifying a set of authentication servers. 
A client is allowed to access the application server if he had complied with any 
sub-policy, i.e. obtaining sufficient certificates from the specified authentication 
servers within a specified time. 

Unlike monolithic authentication schemes where the server is usually over- 
loaded with authentication requests, the authentication policies in CorSSO allow 
a client to prove her identity by different criteria. With threshold cryptography, 
each criterion is divided into requests to multiple authentication servers. The au- 
thentication process is therefore distributed, so the load of each authentication 
server is more balanced. 

In our experiments, we model client behavior. In the simplified client model 
shown in Figure 4, a client has only two locations: Authentication and Access. 
In Authentication, he firstly chooses one of the two policies by setting the value 
of p non-deterministically. If the first policy is chosen, i.e. p = 1, he needs to 
collect more than THi certificates from the authentication servers. Similarly, more 
than TH 2 certificates are needed if the second policy is chosen. Then he starts 
to collect certificates. If he had obtained sufficient certificates within the time 
limit, he can then move to Access. Finally, he can discard the certificates and 
reset the policy, i.e. p := 0, and then return to Authentication. 

To model timing constraints, we use two clock variables x and y. Let us sup- 
pose that it spends at least TA to acquire a certificate. Then one new certificate 
can be added until x exceeds TA, and once it collected, x is reset for the next 
certificate. Furthermore, all certificates for a sub-policy must be collected within 
TE, which is modeled by y. Note that y is reset each time the client choosing a 
new sub-policy. 
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6.2 Experimental Results 

We compare the performance of our model checker with RED [35], a full TCTL 
model checker for timed systems. In the experiments, we first verify the safety 
property that all clients in Access have acquired sufficient certificates necessi- 
tated by the chosen policy. Then we implant a bug by mistyping TH 2 for THi in 
the transition 3 in Figure 4. This may raise a violation against the safety property 
once THi < TH 2 . Systems with two to eleven clients are checked by both xBMC 
and RED. It is noted that we did not turn on the symmetry reduction option 
in RED, even though the systems under test are highly symmetric^. Since the 
present technique does not take symmetry into consideration, we think it would 
be unfair to compare it with other heuristics. Both RED and xBMC report the 
safety property is satisfied for normal cases, and the manually-inserted bugs are 
detected by both tools as expected. The performance results^ are shown in Table 
1. Instead of exploring all regions in the system, xBMC guarantees the correct- 
ness by induction at the third step. On the other hand, the traditional reach- 
ability analysis in RED has to explore all representatives of equivalent states. 
Consequently, the time spent by xBMC is only a fraction of that required by 
RED^. For all bug inserted cases, xBMC reports that the property is falsified at 
the 12*^ step. Since SAT-based BMC is efficient for finding defects in design, the 
performance of xBMC is in accord with our expectations. Compared to RED, 
xBMC spends only 3.33% time cost to find a counterexample in the system with 
11 clients. Note that xBMC stops once a bug is detected, which means that the 
performance in bug hunting may not necessarily depend on system size. 



Table 1. Experimental Results of xBMC and RED 





Correctness Guarantee 


Bug Hunting | 


# of clients 


RED 5.0 


XBMC 2. 1 


RED 5.0 


XBMC 2.1 


2 


0.25s 


0.03s 


0.24s 


10.00s 


3 


2.71s 


0.03s 


2.64s 


29.11s 


4 


18.24s 


0.11s 


17.50s 


50.55s 


5 


89.25s 


0.26s 


85.23s 


99.81s 


6 


338.86s 


0.41s 


316.28s 


153.97s 


7 


1076.37s 


0.59s 


990.16s 


278.96s 


8 


2960.56s 


0.94s 


2734.60s 


554.69s 


9 


7169.19s 


4.94s 


6545.04s 


739.46s 


10 


15950.74s 


5.87s 


14727.29s 


582.09s 


11 


33201.08s 


12.38s 


30722.57s 


746.34s 



^ Symmetry reduction is not activated by default. 

^ All experiments were performed on a Pentium IV 1.7 GHz computer with 256MB of 
RAM running the Linux operating system. 

® RED performs much better if symmetry reduction is used. In fact, it outperforms 
xBMC almost universally with the heuristic. 
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7 Related Work and Discussion 

Due to the success of hardware verification by SAT-based techniques, SAT-based 
model checking has recently gained considerable attention among software ver- 
ification researchers. Clarke et al. [14] developed a SAT-based bounded model 
checker for ANSI C, and the present authors used xBMC to verify Web applica- 
tion code security in an earlier project [18]. Although both projects focused on 
software verification, neither supported timing behavior analysis. 

The verification of timed automata by checking satisfiability has been the 
topic of several research projects. Most research work encodes the evaluation of 
atomic constraint to variants of predicate calculus with real variables. Niebert et 
al. [30] represented the bounded reachability problem in Boolean variables and 
numerical constraints of Pratt’s difference logic, while Audemard et al. [7] took 
a clock as a real variable, and reduced the bounded verification of timed systems 
to the satisfiability of a math-formula with linear mathematical relations hav- 
ing real variables. Moura et al. [27] also used real variables to represent infinite 
state systems. Penczek et al. [31] [36] handled timing behavior by discretization, 
in which they divided each time unit into 2n segments (n = number of clocks) . 
Finally, Yu et al. [38] explicitly encoded region automata and proved that the 
reachability analysis of dense-time systems is equivalent to solving the satisfia- 
bility of Boolean formulae iteratively. However, most of these approaches do not 
emphasize the correctness guarantee. 

Some researchers have tried to determine whether iterative satisfiability anal- 
ysis can terminate early if more restrictive formulae are generated based on sat- 
isfiability results from previous iterations. Moura et al. [28] achieved this by 
using induction rules to prove the safety properties of infinite systems. Although 
they were able to detect cases where early termination was possible, they could 
not guarantee termination. In [34], Sorea checked full LTL formulae based on 
predicate abstraction to extend BMC capabilities. Compared to encoding ab- 
stract predicates, encoding regions themselves provides at least two advantages 
- simplicity and an intrinsic bound for termination. 

McMillan [24] uses interpolations as an over-approximation of the reachable 
states. His technique not only verifies the correctness of safety properties, but 
also guarantees termination. However, it has yet to support timing analysis. 
Compared to interpolation, where the analysis of internal information in SAT- 
solvers is required, the inductive method can be implemented by treating SAT- 
solvers as black boxes. We would like to investigate the merging of interpolations 
and our approach in the future. 

Unlike other reachability analysis techniques for timed automata, discretiza- 
tion allows us to deploy the inductive method rather straightforwardly. However, 
it is unclear how to apply the same technique in BDD [12], DBM [21] or CRD 
[35]. It would also be interesting to develop a corresponding inductive method 
for them and compare their performance with our discretization approach. 
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8 Conclusion and Future Work 

BMC is more efficient in identifying bugs, especially for systems with a large 
number of program variables; however, its correctness guarantee performance 
can be disappointing.. With induction, it is now possible to prove safety proper- 
ties efficiently by BMC in some cases. With the help of discretization, we are able 
to migrate the success of the discrete-system verification to timing-behavior anal- 
ysis. We applied induction algorithms to our previous research on discretization 
of region automata, and thereby reduced the reachability analysis of dense-time 
systems to satisfiability. The results of our primitive experiments indicate that 
even without enhancements (e.g. symmetry reduction, forward projection, and 
abstraction), our approach is more efficient than RED in correctness guaran- 
tee as well as bug hunting. However, one limitation of our approach is that the 
performance depends on whether and when the induction successes. 

In the future, we plan to investigate on enhancements to improve the effi- 
ciency. Notably, the interpolation technique proposed in [24] is of interest to us. 
Secondly, we would like to integrate xBMC with WebSSARI [18] in order to 
verify the timing behavior of real-world Web applications. 
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Abstract. The main disadvantage of model checking is the state ex- 
plosion problem that can occur if the system being verified has many 
asynchronons components. Many approaches have been proposed to deal 
with this challenge. This paper extends an approach that suggests com- 
bining static analysis and partition of model checking tasks into different 
cases for redncing the complexity of model checking and introduces al- 
gorithms and a tool for the static analysis. This extended approach and 
the tool are then applied to models of known authentication protocols 
and operating procedures, which shows that the approach and the tool 
could have a wide range of applications. 



1 Introduction 

Although much effort has been put into the exploration in general techniques for 
reducing models and increasing the efficiency of model checking, many realistic 
systems are still too large to be manageable for existing model checking tools. So 
it is still important to find techniques for special types of models. The motivation 
of this paper is the verification of models with non-deterministic choices and 
open-environment. The basic idea is to find methods for adequately representing 
different cases in such models, in order to successfully employ model checking 
method in the verification of complex models. An approach to divide a model 
checking task based on a case analysis has been proposed [15]. The approach is to 
partition a model checking task into several cases and prove that the collection of 
these cases covers all possible cases. In this approach, verification of the former 
is done by model checking, while verification of the latter is done either by 
using expert judgment or by static analysis for models satisfying certain criteria. 
Some types of models that may satisfy such criteria were discussed in [15]. This 
work extends the capability of the approach by considering additional types 
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of models and multiple case-bases such that it can be applied to more types 
of applications, and by implementing the static analysis for certain types of 
models we avoid using expert judgment which could be erroneous when the 
models are complex and obtain automatically different sub-goals instead of doing 
analysis and composing sub-goals manually. Finally, three application examples 
are presented to show how the approach and the tool are used and to show that 
the approach and the tool could have a wide range of applications. The modeling 
language of systems used in this paper is Promela - the input language of the 
model checking tool Spin [4,5] which is used in the subsequent verification of the 
application examples. 

This paper is organized as follows. In section 2, we introduce the partition 
strategy. In section 3, we discuss static analysis of Promela models and introduce 
our analysis tool. In section 4, three application examples are presented. Section 
5 discusses related work and section 6 is concluding remarks. 



2 Case Based Partition Strategy 



The strategy was described in [15], we first given an introduction to the back- 
ground of this strategy and then discuss an extension of this strategy to multiple 
case-bases. 



2.1 Basic Strategy 



Let T be a system and x be the variable array of T. The system is in the state 
V , if the value of x at the current moment is v . A trace of T is a sequence of 
states. The property of such a trace can be specified by PLTL formulas [2]. 



— (p is a PLTL formula, if ip is of the form z = w where z € x and w is a value. 

— Logical connectives of PLTL include: 

-■ (negation), A (conjunction), V (disjunction) and — >■ (implication). 

If ip and Ip are PLTL formulas, then so are -up, ip Atp, ipV tp, and ip ^ ip. 

— Temporal operators include: 

X (next-time), U (until), O (future) and [] (always). 

If ip and Ip are PLTL formulas, then so are X p, p U ip, O p, and [] p. 

Let t be a trace of T. Let HEAD(t) be the first element of t and TAIL*(t) 
be the trace constructed from t by removing the first i elements of t. For conve- 
nience, we write TAIL(f) for TAIL^(t). Let t \= p denote the relation “t satisfies 
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Definition 1. t\= {p is defined as follows: 

t \= X = V iff the statement a; = n is true in HEAD(t). 

t 1= iff t ^ 1^. 

t\= p /\f} iff t 1= 1^9 and t |= 'tp. 

t \= p\/ ip iS t \= ip or t \= ip. 

t \= ip ^ ip iS t \= p implies t |= ip. 

t\=Xp iffTAIL(t)hA- 

t\= pU ip iff such that TAIL*(t) |= ip and TAIL*(t) \= p for 0 < i < k. 

t 1=0 p iff such that TAIL^(t) |= p. 

t 1= Wp iS.t\= p and TAIL(t) |= \\p 

Let T be a set of traces. 

Definition 2. T \= p if and only if 'it &T.{t\= p). 

Definition 3. T ^ p if and only if it € T.(t ^ p). 

Note that this definition of the ^ may be different from the usual interpre- 
tation of not \=. One may consider ^ as a new relation symbol. Let TOP(T) be 
the set consisting of HEAD(t) for alH G T and SUB(T) be the set consisting of 
TAIL(t) for all t G T. From the above definitions, we derive the following: 

T \= X = V iff the statement a; = u is true in s for all s G TOP(T). 
r^^p iST^p. 

T \= p Alp iffT^V^ and T \= ip. 

T \= pi ip iff there are T' and T": T = T' U T” and T' \= p and T” \= ip. 

T \= p ^ ip iS there are T' and T": T = T' U T” and T' p and T" \= ip. 

T^Xp iff SUB(T) h 
T \= p U ip iff there are T' and T" : 

T = VU T", V h V', T" h ‘P and SUB(T") ^pUip. 

T \=0 p iff there are T' and T"\ 

r=VU T", V ^p and SUB(T") h<> i’- 
r^[]p isr^p and SUB(T) h D‘P- 

Let T be the set of the traces of T and phe & propositional linear temporal 
logic formula. Let T \= p denote the relation “T satisfies p" . 

Definition 4. T \= p if and only ifT\=p. 

Suppose that there is a model M and a formula p, and our task is to check 
whether p holds in M, i.e. we would like to prove: 

M \= p. 

The principle of this strategy is to partition the search space of M, then the 
formula p can be proved within each portion of the search space. In order to do 
so, we have to characterize different portion of the search space. The technique 
for this characterization is to attach formulas to p, so that in the verification 
oi M \= p, only paths relevant to the attached formulas are fully explored (or 
paths irrelevant to the attached formulas are discarded at an early phase of 
model checking). 
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Lemma 1. Let f/'i, V’n be formulas such that M \= 'ifiV ■ ■ ■ y ipn- M \= ip if 
and only if M \= ifi ^ ip for all i G {1, n}. 

Proof: It is obvious that M \= p implies M \= ‘ipi ^ p. We prove that 
M \= ipi ^ p for all i G {1, n} implies M \= p as follows. 

— Let M be the set of traces of M. Since A4 ^ i/'i V • • • V ifn, there are 
Ml, Mn such that M = M\ U • • • U Mn and Mi ^ ipi for all i. 

— On the other hand, we have M \= tfi ^ p, hence Mi \= ipi ^ p and Mi ^ p 
for all i. Therefore M \= p. H 

For a given model M, in order to be successful with this strategy, we have to 
be sure that the proof oi M \= fii ^ p is simpler than the proof oi M \= p for 
each i. Therefore M (the set of traces representing the behavior of M) should 
have the following properties: M can be partitioned into M'^ and M'( such that: 

— M'^ and M” h V-, 

— M' ^ if i can be checked with high efficiency; 

— M” is significantly smaller than M. 

For the selection of ipi (which determines M'i and M”), it would be better to 
ensure M'f (with i = 1, ...,n) be pair-wise disjoint, whenever this is possible. In 
addition, we shall discharge M \= ipiV ■■ - \/ ipn i>y one of the following methods: 

— Application of static analysis; 

— Application of expert knowledge. 

The reason for not verifying M \= ipi \/ ■■■ \/ ipn hy model checking is that 
the verification of this formula is not necessarily simpler than the verification of 
M ^ 1 ^. In order to be able to discharge M \= ipi\/ ■■ - \/ ipn easily by the proposed 
methods, we restrict the formula ipi to be of the form [] (a; = riQ V cc = Vi) where 
a; is a variable and Vii,Vi are constants. We call the variable x the case-basis of 
a partitioning. Formally, we have the following theorem. 

Theorem 1. Let v be a variable and none be the initial value of v. Suppose 
that V is changed at most once during an execution of M. M \= p if and only if 
M \= [](u yf none -y v = i) ^ p for all i yf none in the range of variable v. 

Proof: Suppose that {none, v\, ..., v„} is the range of v. Since v is changed at 
most once during an execution of M, the traces of M can be partitioned into n 
subsets Mi,...,Mn such that Mi ^ [](?; = none V v = Vi) for i G {1, ...,n} and 
therefore M \= \{{v = none V v = rii) V • • • V [](v = none V v = u„). The rest 
follows from Lemma 1. □ 



2.2 Multiple Case-Bases 

A model may have multiple case-bases, i.e. there may be several Xj satisfying the 
condition that M \= ipiV ■■■ V ipn where ipi is of the form \\{xj = vq V Xj = Vi). 
where a; is a variable and vo,Vi are constants. In case of two case-bases: x and 
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y, we have M |= V • • • V f/'m ^ M ^ V • • • V for y. For the model 
M, it can be divided into even smaller pieces: Ty {i = 1, ...,m and j = 1, ...,n) 
where \= ijjf A ip j. Then the verification task T \= ip can be divided to a set 

of simpler verification tasks T \= ipf A ipj ^ ip. Generally, we have: 

Theorem 2. Let Xi (i = 1, 1) he a set variable and nonci be the initial value 
of Xi- Suppose that Xi is ehanged at most once during an execution of M for 
all i. M \= ip if and only if M \= []((2:ii yf nonei -A Xi = A • • • A {xn yf 
nonei -A xi = vim)) -A for all Vij yf nonCi in the range of variable Xi. 

This theorem is an extension of Theorem 1 and is the basis for our analysis 
tool Case-Basis Explorer which is to be introduced in section 3. 

3 Static Analysis and Case-Basis Explorer 

In order to successfully find the variable satisfying Theorem 1, we develop a 
tool for static analysis called Case-Basis Explorer (CBE). In CBE, we consider 
models of particular types with a situation where non-deterministic choice is 
present. Let the non-deterministic choice be represented by choice{x, v) where 
x is the variable used to hold the chosen value and Ij = {vi,...,Vn} is the set of 
the possible values. We consider two types of choice{x, v ) for CBE: 



lif 




if 


II 








and 




:: ■■■; x = v„; ■■■ 

fi; 




fi; 



The variable name x in the second type of choice{x, v) has to be ex- 
tracted from the definition of the process type p. We refer to the first type 
as choicei{x, v) and the second type as choicc 2 {x, v). The set of traces of a 
model of these types has the potential (depending on the successful static analy- 
sis) to be divided into subsets such that each subset satisfies one of the following 
formulas: 



[](a: yf none -A x = v\)^ • • •, [](2^ ^ none -A x = Vn) 

where none is the initial value of x, before the non-deterministic value has been 
assigned. 

The purpose of the static analysis is to show that the partition of the search 
space into these cases is complete, i.e. to show 

M 1= [](x yf none — >■ a; = vi) V • • • V [](a; yf none -A x = u„). 

Basically, we analyze the model in order to determine the set of cases and to 
ensure that x is changed at most once in an execution of a model (in accordance 
with Theorem 1), i.e. check the following conditions: 
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— The range of x is {none, wi, 

— The value of x is none before x is assigned a value, and after x is assigned 
a value it remains unchanged. 

To locate choice{x, 1 j), CBE analyzes the structure of Promela programs with 
the help of the popular lexical analyzer generator Lex [8] and parser generator 
Yacc [8] . The following conditions are checked in CBE for both of the two types 
of choice{x, ~v): 

— There is only one program segment that matches choice{x,lj). 

— choice{x, ~v) does not occur in a loop for the given pair {x, 'v). 

— The process declaration containing the program segment matching 
choice{x, v ) is only instantiated (in a manner similar to procedure calls) 
once during a run of the model. 

For choicei{x,^), CBE will check whether choicei{x,^) satisfies the follow- 
ing condition: 

— The only statements that may update the value of x are those in 
choice\{x, v), i.e. 

neither statements of the form “x = v” nor “c?Xi, ..., Xj_i, a;, ccj+i, ..., Xfc” 
are allowed outside of the choice-statement. 

For choice2{x,lj), CBE will check the following conditions: 

— Process p is only instantiated once in each non-deterministic choice. 

— Process p is not instantiated outside choice2{x, v). 

— Process p has a formal parameter x and the argument x remains unchanged 
in p. 

For a given pair (x, zt), if choice{x,lj) satisfies all of the conditions, the 
variable x will be a case-basis of the verification task. 



3.1 Case-Basis Explorer 

The Case-Basis Explorer works as follows. It may take one or two arguments. 
The first one (mandatory) is the file name for the Promela model and the second 
one is the file name for the formula to be verified. Suppose that the formula to be 
verified is provided together with the model. After an analysis, there are three 
cases. The first case is that no case basis is found. The second case is that CBE 
finds only one case basis. CBE outputs the following message: 

One case basis is found. The case basis is as follows: 



Var 


Initial Value 


Potential Values 


name 


value 


Vi,V2, .:,Vn 



There are n sub-goals to be verified. The suggested verification sub-goals 
are written in file “ filename^\ 
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The third case is that CBE finds more than one case basis. In this case, CBE 
outputs the following message: 

m case bases are found. The case bases are as follows: 



Number 


Var 


Initial Value 


Potential Values 


1 


namei 


valuei 


Vin-^ 


m 


namejn 


valuCm 


^m2-> '^mrim 



Please choose one or more case basis: 

Then the user inputs the chosen numbers and the verification sub-goals are pro- 
duced as in the previous case. If we omit the second argument (i.e. the property 
to be verified), CBE will just try to find and present the case-bases for the user 
to choose for manually composing appropriate sub- goals. 



3.2 Discussion 

By summarizing the above discussion, we refine the steps of case-based verifica- 
tion strategy as follows: 

— Use CBE to analyze the model to get the case-basis x and its range 'v . 

— For each Vi € v , construct ipi = \\{x ^ none —>■ x = Ui )—>■(/? as a subgoal 
for verification. 

— Use the model checker SPIN to check whether M \= ^pi holds for i = 1, ..., n. 

— When CBE returns more than one case-basis, we pick one or more to con- 
struct sub-goals accordingly. 

Limitations of CBE. It is worth mentioning some limitations of CBE. For the 
first it cannot trace the change of value of array element. For the second it cannot 
trace the value in channels, such that CBE cannot analyze the accurate range 
of variable x, if there are assignments like chlx. The tool needs to be improved 
in the future research. Even so, the strategy and the tool are very useful to 
model checking large models which is shown in the next section by presenting 
application examples. 

Parallel computation. It is easy to take advantage of parallel and networked 
computing power when the problem can be decomposed in independent sub- 
problems. One problem is how to fully exploit the available computing resources. 
It may not be possible (with the proposed strategy) to divide a problem in such 
a way that all sub-problems require approximately the same amount of time for 
model checking. It could be better with respect to the utilization of the available 
computing power, if there are more sub-problems than available computing units. 
In such cases, we may estimate the difficulty of the sub-problems and make a 
schedule for the sub-problems. 
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4 Application Examples 

We have chosen three application examples, two from protocol verification and 
one from verification of operating procedures. 

— The first one is Needham-Schroeder-Lowe protocol which is an authentica- 
tion protocol. Protocol verification is a traditional area of model checking and 
this protocol has been used in several occasions to show that model check- 
ing can reveal problems that might otherwise go undetected [6] and to be a 
representative of protocols of this category for studying modeling and model 
checking strategies [11]. We use the the model of Needham-Schroeder-Lowe 
protocol (the fixed version) ^ created according to the principle presented in 
[11] to show that our strategy and the tool have advantages for this kind of 
models. 

— The second one is TMN protocol. This protocol has also been used in dif- 
ferent occasions to show the use of model checking strategies [7] and other 
methods [12] for improving such protocols. The model in [7] is specified in 
CSP. Because we are using Spin for model checking, we have converted the 
model into a Promela version and the discussion is based on the new model. 

— The third one is an operating procedure. Operating procedures are doc- 
uments telling operators what to do in various situations including those 
for surveillance and emergency handling. Verification of operating procedure 
have been discussed in [13,14]. We use the Promela modeb of an operating 
procedure discussed in [13]. 

By using the tool and the verification approach, we can reduce the search 
space (and hence also the need of peak memory) of the application examples by 
a value from 19% to 73%. The first and the second examples are of the type 
choicei{x, v) while the second example is of the type choice 2 {x, v). The first 
and the third examples are free of errors with respect to the given properties, 
while the second has an error with respect to one of the given properties. The 
examples show that the tool and the verification approach could have a wide 
range of applications. 



4.1 Needham-Schroeder-Lowe Protocol 

Needham-Schroeder-Lowe protocol is a security protocol aiming at establishing 
mutual authentication between an initiator A and a responder B, after which 
some session involving the exchange of messages between A and B can take 
place. The following is a description of this protocol. 

A^ B : {ria,A}px(B) 

B ^ A : {ua, Ub, 

A ^ B : {nb}pK(B) 

^ Available from “www.informatik.uni-freiburg.de/~lafuente/models/models.html” 
which contains a database of Promela models. 
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Here A is an initiator who seeks to establish a session with responder B. A 
selects a nonce Ua, and sends it along with its identity to B encrypted using 
B’s public key. When B receives this message, it decrypts the message to obtain 
the nonce Ua- It then returns the nonce along with a new nonce rib smd its 
identity to A, encrypted using H’s public key. When A receives this message, he 
should be assured that he is talking to B, since only B should be able to decrypt 
the first message to obtain ria- A then returns the nonce rib to B, encrypted 
using H’s public key. Then B should be assured that he is talking to A. 

We use the model in [11] which is a model in Promela. There are one initiator 
process, one responder process and one intruder process in the model. For the 
correctness of the protocol, we have to express that initiator A and responder B 
correctly authenticate each other. The property to be checked can be expressed 
as following PLTL formulas. 

— -ipi : []([]-iIniCommitAB V (-ilniCommitAB 17 ResRunningAB)) 

“ V '2 : [](0“'ResCommitAB V (-iResCommitAB 17 IniRunningAB)) 

In which 

— IniRunningAB iff initiator A has initiated a protocol session with B\ 

— ResRunningAB iff responder B has initiated a protocol session with A; 

— IniCommitAB iff initiator A has concluded a protocol session with B; 

— ResCommitAB iff responder B has concluded a protocol session with A; 

With the help of CBE, we obtain one case-basis in this model: 

Pini : party with the range {none = 0, = I, V 2 = B) 

The variable Pini : party is a composed identifier from the proctype Pini{...) 
that has three parameters self, party and nounce. This proctype is used in the 
following context: 



if 

:: run Pini(A, 7, iVa); 
:: run Pini(A, B, Na); 

fi; 



Pini : party is the variable party (the name of a formal parameter) of the 
proctype Pini{...). The constants A, I and B are identifier for the initiator, 
the intruder and the responder. The constant Na is the nounce of A. CBE also 
detects Pini : self and Pini : nounce as potential case-bases, however they only 
have one case and are not useful. After choosing Pini : party as the case-basis, 
for each property f/'i to be checked two sub-goals are constructed as follows: 

— ipii: []{Pini : party yf none — >■ Pini : party = /)—>■ i/'i 

— \\{Pini : party yf none — >■ Pini : party = B) ^ ipi 
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Each sub-goal is then verified independently by Spin. A comparison of the 
usage of memory (indicated by the number of states and transitions) is shown 
in Table 4.1. The first line is the number of states and the transitions for the 
verification of ipi directly. The second and the third line are the number of states 
and the transitions for the verification of ■0n and '!/'i 2 according to our approach. 
As shown in the table, the maximum number of states during the verification 
has been reduced to about 81% of that of the original task. 



Table 1. Verification of Needham-Schroeder-Lowe Protocol 



Verification Task 


States 


Transitions 


ipi 


379 


1192 


V’li 


309 


1011 


V’12 


75 


186 


l/>2 


378 


1275 


i>21 


309 


1144 


V’22 


74 


136 



4.2 TMN Protocol 

TMN is a security protocol for exchanging session keys. In order for two agents 
to set up a secure session, communicating over an open channel, they must first 
decide upon a cryptographic session key, which should be kept secret from all 
eavesdroppers. We use the model in [7] as the basis of this verification example. 
According to the description of [7] the protocol works as follows. 

The TMN protocol concerns three players: an initiator A, a responder B, 
and a server S who mediates between them. The TMN protocol for establishing 
a session key involves the exchange of four messages which can be defined as 
follows: 



S-. A.S.B.E(ka) 

S^B: S.B.A 
B ^ S : B.S.A.E{kb) 

S ^ A: S.A.B.V{ka,kh) 

When the initiator A wants to connect with the responder B, he chooses a key 
ka, encrypts it (by a standard encryption function and E{ka) is the result of 
encryption). The server sends a message to the responder, telling him that A 
wants to start a session. The responder acknowledges by choosing a session key 
fcf,, encrypts it and sending it to S. The server forms the Vernam encryption 
of the two keys {V{ka,kb) is the result of this encryption) he has received, and 
returns this to A. When A receives this message, he can decrypt it using the key 
ka to recover the session key kb- 
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In addition to the three players, there is an intruder that can interact with 
the protocol in any way we would expect a real attacker to be able to do. The 
model in [7] is modeled in CSP, and we convert it to a Promela model. The 
principle of modeling TMN in Promela follows from that presented in [11] and 
we use similar correctness specifications as those of the previous subsection. The 
properties to be checked are expressed as following PLTL formulas: 

— tpi : []([]-iIniCommitAB V (-ilniCommitAB [/ ResRunningAB)) 

“ V '2 : 0([]“'ResCommitAB V (-iResCommitAB [/ IniRunningAB)) 

With the help of CBE, we obtain one useful case-basis in this model (the 
process is similar to that in Section 4.1): 

Pini : party with the range {none = 0,vi = I, V 2 = B) 

For each property tpi to be checked two sub-goals are constructed as follows: 

— ipii: []{Pini : party yf none — >■ Pini : party = I) ^ ipi 

— ^pi 2 ^■ \\{Pini : party yf none — >■ Pini : party = B) 'tpi 

Each sub-goal is verified independently. A comparison of the usage of memory 
is shown in Table 4.2. As shown in the table, verification of ip 2 result in errors. In 
this case the model checker cannot exhaustively explore state space of the model, 
because the model checker find an error in an early stage of state exploration 
and stop, no advantage can be obtained with this strategy. In the verification 
of tpi, no errors are found and the state exploration is complete. The maximum 
number of states during this verification is reduced to about 58% of that of the 
original task. 



Table 2. Verification of the TMN Protocol 



Verification Task 


States 


Transitions 


Error 


tpi 


800 


2158 


no 


tpii 


341 


900 


no 


tpl2 


464 


1263 


no 


tp2 


11 


12 


yes 


tp21 


10 


13 


yes 


V’2 2 


12 


14 


yes 



Discussion. The strategy works better with respect to the reduction of peak 
memory usage if there are more cases and if the the search space is evenly 
(roughly speaking) partitioned into different cases. The two cases in the first 
application is not even and the cases in the second application is better. In the 
third application, there are more cases and it gets the best result out of this 
strategy among these applications. 
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4.3 Steam Generator Control Procedure 

This case study is from [10] in which problems related to the verification of 
operating procedures by model checking were discussed. The purpose of this 
subsection is to show that the strategy and CBE can be applied to this type of 
verification models as well. 

The primary goal of the procedure of the case study is to repair a defected 
control valve (there are 4 control valves) that has a problem. After identifying 
the defected control valve, the basic action sequence of the operator is as follows: 

— Start isolating the steam generator. 

— Manipulate the primary loop. 

— Continue isolation of the steam generator. 

— Repair the steam generator control valve and prepare to increase the power. 

— Increase the power. 

Before starting the process of repairing the control valve, one has to check the 
power level. If it is not normal, a supervisor must be notified and this procedure 
is terminated (in order to activate other procedures). Since there are 4 valves 
and the assumption is that one of them may be defected at a time, there are 4 
cases in which one specific valve is defected. In addition, there is one case that 
none of the valves are defected. The states of valves were represented by an array 
State\\. The states of valves can be Defected or Normal (which are two different 
constants). A defected valve v has value S'tote[v]=Defected. A property to be 
verified is as follows: 

O {State[MainProcess\ == CompletedA 
{{State[CheckPointPower Level] yf NOK — >■ 

{State[RL33S002] == Normal A State[RL35S002] == NormalA 
State[RL72S002] == Normal A State[RL74:S002] == Normal))A 
{State[C heck Point Power Lev el] == NOK — >■ 

State[Supervisorl] == Notified)))) 

The constants RL33S002, RL35S002, RL72S002, RL74S002 denote the four 
different valves. The constant NOK denote the state “not normal”. The formula 
states that the execution of the main process (i.e. the operating procedure) 
would eventually be completed, and at the same time if the power level at the 
checkpoint is normal then the defected control valve (if there is any) must have 
successfully been repaired (achieved the state “Normal”), otherwise if the power 
level at the checkpoint is not normal, a supervisor must have been notified. Let 
us denote this formulas by ip. 

The static analysis tool CBE at the moment, only handle single variables 
and not arrays. For separating different cases, we introduce an extra variable 
mycase which has the value none initially and is assigned a value when the 
operating procedure is dealing with one specific defected valves and the value 
of mycase ranges over {none = 0,ui = l,V 2 = 2,V3 = 3,U4 = 4,v^ = 5}. This 
modification does not affect the execution of the Promela model with respect 
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to the aforementioned property and the number of states (when the introduced 
statements are packed into atomic steps with existing statements). 

After such a modification, given the model and the property ipi CBE identifies 
that the variable mycase can be used as the case-basis and produces the following 
five formulas: 

— '01 • []{mycase yf 0 — >■ mycase = 1) — >■ 0 

— '02 : []{mycase yf 0 — >■ mycase = 2) — >■ 0 

— '03 : []{mycase yf 0 — >■ mycase = 3) — >■ 0 

— 04 : []{mycase yf 0 — >■ mycase = 4) — >■ 0 

— 05 : []{mycase yf 0 — >■ mycase = 5) — >■ 0 

Then we verify ipi for i = 1,2, 3, 4, 5 instead of directly verifying 0. A com- 
parison of the advantages with respect to the usage of memory is shown in Table 
4.3. As shown in the table, the maximum number of states during the verification 
is reduced to about 27% of that of the original task. 



Table 3. Verification of the Operating Procedure 



Verification Task 


States 


Transitions 


0 


293126 


1.65237e-t008 


01 


79878 


4.42901e-t007 


02 


79878 


4.42901e-t007 


03 


79878 


4.42901e-t007 


04 


79878 


4.42901e-t007 


'ips 


9478 


4.32181e-t006 



5 Related Work 

As mentioned at the beginning of the introduction, much effort has been put into 
the exploration in techniques for reducing models and increasing the efficiency 
of model checking. Some are implemented as a standard part in model checking 
tools, for instance, on-the-fly model checking and partial order reduction are 
implemented in the model checking tool Spin [4,5] . Others may be used as pre- 
processing methods for reducing the complexity of models, for instance, we may 
use cone of influence reduction [1] or program slicing [10] for reducing the com- 
plexity of models. The approach presented in this paper can be regarded as such 
a pre-processing method which can be automated (and is done in this work) and 
can be used in combination with the aforementioned techniques. The approach 
looks similar to a strategy proposed in [9] . A theorem in [9] is as follows: 

If, for all i in the range of variable v, ^ [] (u = f — >■ (p) , then ]= [](/?. 
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The basic idea here is to break the proof of temporal property [](/? into cases 
based on the value of a given variable v. Suppose that the value of v refers to 
the location in some large data structure (or array), to prove any given case 
V = i, it is only necessary to refer to element i of the array and the other 
elements of the array can be eliminated from the model by replacing them with 
an “unknown” value [9]. However there are two important differences: for the 
first, it is only applicable to safety properties; for the second, when the strategy 
is used alone, it does not work well with the model checker Spin, because we 
still have to search through the whole state space to check whether the safety 
property holds (in each of the subtask). For using this strategy, appropriate 
abstraction techniques must be applied before one obtains advantages. 

6 Concluding Remarks 

This paper extends the approach introduced in an earlier paper [15]. Firstly, we 
have considered the use of multi-cases for reducing the potential need of memory 
in model checking; secondly, we have considered more types of models for which 
this approach is automizable; finally, we have implemented and introduced a 
tool for the case-basis exploration, such that we avoid using expert judgment 
which could be erroneous when the models are complex and obtain automatically 
different sub-goals instead of doing analysis and composing sub-goals manually. 

The application examples have illustrated the applicability and the advantage 
of the Case-Basis Explorer. Theoretically, for the best result, the complexity of 
the subtasks can be 1/n of the original one, if n subtasks can be produced 
according to the case-basis. In cases where memory is critical, this approach can 
be practical. This strategy can also be used as the basis for utilizing parallel and 
networked computing power for model checking, since the approach generates 
many independent sub-tasks. 

As pointed out earlier, it is not the case that CBE can find all variables 
and their ranges that satisfy the condition of Theorem 1. One of the further 
works is to try to improve the system in order to be able to handle additional 
types of potential case-bases. However as the complexity of analyzing concurrent 
programs is very high, we can only aim at a good approximation of the com- 
pleteness in the sense of finding all potential case-bases and their ranges, but 
not the completeness itself. 
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Abstract. This paper presents a new algorithm for synthesising attacks 
on cryptographic protocols. The algorithm constructs attacks where the 
attacker interacts with the agents in the system in order to discover 
some set of restricted information or send a specific message to another 
participant in the protocol. This attacker is akin to the Dolev-Yao 
model of attacker but behaves in a manner that does not betray its 
presence in the system. The state space reduction heuristics used in 
the algorithm manage the growth of the state space through the use of 
strict typing and by only allowing protocol specifications that do not 
deadlock. The cryptographic protocols are specified in the spi calculus. 
The variant of the spi calculus used is a syntactically restricted variant 
which is semantically equivalent to the full spi calculus. A model checker 
based on this algorithm has been implemented, and the results of this 
model checker on common cryptographic protocols are presented. This 
technique can be used to quickly search for an attack on a protocol. The 
knowledge of the structure of such an attack will enable the protocol 
designers to repair the protocol. 

Keywords: cryptographic protocols, security, model checking 



1 Introduction 

The widespread adoption of the Internet as a means of business has been lim- 
ited by fears of the security of transactions enacted on the Internet. Both the 
suppliers and the customers in an E-commerce transaction need to know that 
the transaction is secure and reliable. In particular each party in the transaction 
must be confident that the other party is who they say they are ( authentication ) ; 
that sensitive and confidential information is not visible to parties other than 
those directly involved in the transaction (secrecy); that information transmit- 
ted between the parties involved in the transaction cannot be altered by a third 
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party {integrity); and that the parties involved in the transaction cannot deny 
that they took part in the transaction {non-repudiation). 

Cryptography, such as Public-Key cryptography [1] [2], provides the necessary 
mechanisms to provide authentication, secrecy, integrity and non-repudiation. A 
cryptographic protocol provides the rules that define the exchange of informa- 
tion between the parties in order to establish a secure communications channel 
between them. The ability of this communications channel to deliver proper- 
ties such as authentication, secrecy, integrity and non-repudiation depends not 
only on the strength of the encryption algorithms but also on the cryptographic 
protocol that established the cryptographic channel. In order to build trust and 
confidence in these systems by the users of E-commerce, it is essential that we 
analyse the strength of the encryption algorithms and formally verify the correct- 
ness of the cryptographic protocols. This paper will focus on the cryptographic 
protocols. Analysing the strength of the encryption algorithms is beyond the 
scope of this work. 

Cryptographic protocols are spcific instances of distributed systems. As with 
other distributed systems, the formal verification of cryptographic protocols is 
a difficult task and many techniques have been proposed. Techniques based on 
theorem proving [3] [4] can verify large systems [5] but require an expert user who 
is proficient in the use of the theorem prover. Techniques based on model checking 
[6] [7] [8] [9] have the advantage of being automatic but can suffer a rapid growth 
in the state space which limits them to small systems. To handle large systems 
in model checkers, approaches such as abstract interpretation[10][ll] have been 
proposed. Using this approach, a large system A for which property X is to 
be verified is abstracted into a smaller system B and a related property Y. If 
property Y holds for system B then property X holds for system A. If property 
Y does not hold then nothing can be said about property X in system A. 

This paper presents a technique for synthesising an attacker from protocols 
specified using the spi calculus [12]. This technique will search for an attacker 
who can interact with the agents in the system to discover secret information. 
This work is motivated by our research in the IMPROVE project which focuses 
on the use of keys in Public-Key Infrastructures (PKI)[14j. 

The paper is structured as follows. In section 2 we introduce the spi calculus. 
This is the formalism which we will be using to specify and analyse cryptographic 
protocols. In section 3 we will present the Spycatcher Algorithm for synthesising 
attackers in protocols specified using the spi calculus. Section 4 will present the 
results of using the Spycatcher Algorithm to analyse some common protocols 
found in the literature and section 5 will present our conclusions. 

2 Spi Calculus 

The spi calculus[12] is derived from the 7r-calculus[15]. It extends the 7r-calculus 
by adding primitives to support encryption and decryption; and, in addition to 
an infinite set of names which are used to represent channels, keys, etc., it also 
introduces an infinite set of variables that can be used as formal parameters in 
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various process constructs. This paper uses a variant of the spi-calculus. The 
grammar for terms and processes is given in Figure 1, where c, k, m and n 
range over names and x, y and z range over variables. Let P[M/x] represent the 
capture-free substitution of all free occurrences of the variable x in the process 
P by the term M. Let a variable list of arity 1 represent a single variable, i.e. 
[x] = X. Figure 2 defines the operational semantics of the spi calculus. Let a 
message of arity 1 [n] represent a singleton name n. In this figure — >■ is the 
least relation on closed processes that holds for figure 2. Similarly = is the 
least equivalence relation on closed processes that holds for figure 2. fn{P) is a 
function that defines the set of free names in P. 

This variant of the spi calculus uses recursion rather than replication as found 
in [12]. As shown in [16], the 7r-calculus using recursion is equally expressive as 
the TT-calculus formulated using replication. This result follows through to the 
spi calculus. 



M,N ■.:= 




n 


name 


[M, A,...] 


an n-ary message 


{M}m 


symmetric encryption 


{M}m+ 


public key encryption 




private key encryption 


[*, y, z, ...] 


variable list 


c{[x,y,z,...]).P 


input 


^M).P 


output 


[M is N] P 


equality test 


case A of {[x, jy, 2 , ...]}m in P 


symmetric decryption 


case A of {[A:, m, n, ...]}jif+ in P 


public key decryption 


case N of {[k,m,n, ...]} in P 


private key decryption 


P\Q 


parallel composition 


(vn)P 


restriction 


P{M) 


process instantiation 


0 


nil 



Fig. 1. Syntax of terms and processes 



3 Spycatcher 

3.1 The Spy 

We model the spy using an extension of the Dolev-Yao model [17]. The spy in 
the Dolev-Yao model is an agent who can: 

— intercept and record messages on any communication channel; 

— can generate messages from the data it already knows and inject them into 
any communications channel; and 
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c{[iVi,...,iVd)-P I c{[xu...,Xi]).Q ^ P\{Ni/xi,.,.,Ni/xi).Q 

P ^ P' 



P\Q^P'\Q 

P ^ P' 

{i/n)P (i'n)P' 

P = P' P' ^Q' Q' = Q 

p^g 

P I 0 - P 

P I Q = Q I P 

(P|Q)|P = P|(Q|P) 

{iym){i>n)P = {un){um)P 
(i^n)O = 0 

(i^n)(P I Q) = P I (i^n)Q if n ^ fn(P) 

P = P' 

P\Q = P' \Q 

P = P' 

{vm)P = {i'm)P' 

[M \s N] P ^ P m M = N 

case { [ni , n ^]} of {[x\, Xi \\ k in P 

iff K' ^ K 

case {[ni, ni]}^/ of {[xi, Xi]}^^^ in P 

iff K' ^ in- 
case { [ni , n^]} of { [a^i , . Xi] }^_ in P 

iff K' = 



input/output reduction 

par invariance 

restriction invariance 

reduction equivalence 
par identity 
par commutivity 
par associativity 
restriction commutivity 
restriction identity 
scope extrusion 

par equivalence 

restriction equivalence 
equality test 
{n\/x-]_, , rii / Xi\).P symmetric decryption 



(ni/xi, ...^rii/xi\).P public key decryption 
(ni/a^i, / Xi\).P private key decryption 



Fig. 2. Operational Semantics of Spi Calculus 



— can use its existing information and/or captured messages to decompose its 
record of intercepted messages into their component parts. 

In addition, the spy may be considered as an honest agent by the other agents 
in the communication. The spy can partake in any instantiation of the protocol 
as an honest participant while at the same time intercepting, decomposing and 
synthesising messages. In our extension the spy is not permitted to act in a 
manner that will cause subprocesses of the protocol to block. We are proposing 
a “transparent Dolev-Yao” intruder that has all the properties of the standard 
Dolev-Yao intruder but also behaves in a manner that does not reveal its presence 
by causing the protocol to fail (block). 

3.2 Protocol Evolutions 

Consider a protocol that can perform an input/output reduction on channel c, 
as follows: 



... I c([Mi,...,M„]).P I c{[yx,...,yn])-Q \ ■■■ 



Normally, in the absence of a spy, this can reduce to: 

... I P I Q[Mi/j/i,...,M„/j/„] I ... 
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However the presence of a spy S can change how the protocol evolves. The 
spy S can capture the message on channel c and use it to increase its knowledge, 
let the message be received by process Q, or synthesise a message on channel c 
and send it to process Q. Let uj, the knowledge set of the spy S, represent all 
the information known to the spy, namely its initial knowledge plus all the infor- 
mation it has been able to accumulate by intercepting messages and decrypting 
components of these messages when it has the appropriate keys. Let decompose 
be a function that takes a knowledge set and a message and decomposes the 
message into its components, decrypts these components when it has the appro- 
priate keys and uses any keys it finds in the decomposed message to decrypt any 
appropriately encrypted message in the knowledge set. Let synthesise be a func- 
tion that takes a knowledge set and generates all possible messages, including 
encrypted components using the keys in the knowledge set. 

Then the protocol fragment above can evolve in three ways in the presence 
of a spy S with a knowledge set to. 

1. ... I F I Q[Mi/yi,...,Mn/yn] \ ■■■ \ Su 

2. ... I F I c{[yi,...,yn]).Q \ ... | S'(u; U decompose [Mi, ..., M„] w) 

3. ... I c([Mi, ...,M„]).F I Q[xi/yi, ...,x„/j/„] | ... j S'w 
where [xi, ..., a;„] € (synthesiser) and S = c([xi, ..., x„]).5" 

The first evolution of the system is where the spy S does not interfere with the 
communication on channel c. The second evolution is where the spy S intercepts 
the message on channel c. Any information contained in the message on the 
channel can be used to increase the knowledge set of the spy. The third possibility 
by which the protocol can evolve is where the spy S constructs a message and 
sends it on the channel c. This possibility is not just a single evolution but 
a multiplicity of evolutions because the spy can use the synthesis function to 
construct any message from the data contained in its knowledge set r. 

Searching a state space of evolutions is infeasible in practice unless some 
heuristics are used to limit the size of the state space. In this paper we use 
two heiristics to limit the size of the search space; strict typing and non-blocking 
evolutions. The use of typing as a means to managing the size of the search space 
is an approach that has been adopted by several reserachers[7][8][18][19][20]. In 
particular [19] and [20] proposes algorithms for the generation of messages by 
the intruder based on the knowledge it has accumulated by monitoring and 
interacting with the participants in the protocol. As a result techniques that use 
typing cannot detect type attacks where an agent in the protocol misinterprets 
one set of data as another set of data. However the use of ASN.1[13] in the 
protocol will prevent these type attacks from occuring in the first place. The main 
contribution of this paper is the use of non-blocking evolutions as a heuristic to 
limit the state space that needs to be explored (section 3.4). Since the intruder 
does not wish to disclose their presence by engaging in actions that will result in 
the protocol failing (blocking) the intruder will only generate messages that allow 
the protocol to continue to evolve. This is an extension of the ideas proposed by 
Mitchell et al.[9]. 
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3.3 Typing 

Figure 3 gives an informal description of the Needham-Schroeder Public Key 
Protocol given in [21]. Message (1) is sent from agent A to agent B. Agent B 
expects to receive a message encrypted with its public key that has two compo- 
nents, a nonce and an agent identifier. There are implicit type constraints. If a 
message with a different type signature is received by B then the protocol will 
block. Since the spy is attempting to ascertain some (possibly secret) informa- 
tion it does not wish to cause the protocol to fail. If B is waiting for message 
(1) then the spy should only send a message to B that conforms to the type 
signature B is expecting. Spycatcher makes these typing constraints explicit and 
uses them to restrict the number of valid messages that a spy can generate. 

Figure 4 defines the syntax of the types allowed in Spycatcher. Keys in Spy- 
catcher are atomic terms. 



3.4 Non-blocking Evolutions 

We shall restrict the set of valid protocol definitions to those definitions in which 
it is expected that equality tests and/or decryption tests will not block in some 
subprocesses. The motivation for this is that if during the normal behaviour of 
a protocol the exchange of messages between subprocesses does not result in 
it blocking, then the only way in which a subprocess will block is when it has 
received an ill-formed message, i.e. the spy has synthesised a message that is not 



{2) B ^ A - {Na,Nt,)l+ 

{3)A^B: {iV6}^+ 





Fig. 3. Needham-Schroeder Public Key Protocol (flawed version) 



M,N 
n : Agent 




an agent identifier 


n : Nonce 




a nonce 


n : Name 




a spi calculus name 


k : Sym 




a symmetric key 


k : Pub 




a public asymmetric key 


k : Priv 




a private asymmetric key 


{M,N,...}k 


Sym 


symmetric encryption 


{M,N,...}k 


Pub 


public asymmetric encryption 


{M,N,...}k 


Priv 


private asymmetric encryption 


[M,N,...] 




an n-ary message 



Fig. 4. Syntax of terms in Spycatcher 
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expected in the normal behaviour of the protocol. Since we have restricted our- 
selves to looking for transparent Dolev-Yao intruders this enables the algorithm 
to limit the number of messages that the spy has to synthesise. 

Consider the following process. 

c{[v : Agent, x]). case x oi {y : Nonce}Ki-.Sym in [y is Na] P 

Since the spy is attempting to ascertain some specific information it does 
not wish to cause the protocol to fail (block) . For this process to evolve into the 
process P certain constraints are implicit on the message received on channel 
c. In addition to the constraint of matching type signatures the variable x must 
be a symmetrically encrypted message containing a single nonce, otherwise the 
decryption test will block. Since this nonce y in the encrypted message is subse- 
quently tested for equality against Na the message received on channel c must 
have the following format if the process is not to block and evolve into process 
P (where j.Ty represents any value of type Ty). 

[_ : Agent, {Na : Nonce} _,sym ] 

Figure 5 gives a definition of processes with non-blocking evolutions, where 
NB{P) is a predicate that returns true if process P has a non-blocking evolution 
and sig{x) returns the type signature of x. 



NBic{x).P\c{y).Q \ R) 


= NB{P 1 {y/x)Q 1 R) 
= false 


if sig{x) = sig{y) 
otherwise 


NB{case x of {M}k in P 


Q) = NB{P 1 Q) 
= false 


if a: = [M}k 
otherwise 


NB{[x is y] P 1 Q) 


= NB{P I Q) 
= false 


a X = y 
otherwise 


NB{{nx)P 1 Q) 


= NB{P I Q) 




NB{P{M) 1 Q) 


= NB{{x/M)P 1 Q) 




NB{0) 


= true 




NB{P) 


= NB{Q) 


III 

-O 



Fig. 5. Definition of Non-blocking Evolutions 



In general, given the restriction of only allowing non-blocking evolutions of a 
protocol, a value v received on a channel is expanded on by scanning the current 
subprocess until it reaches a null process or process instantiation. During this 
scanning: 
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— If an input c{[x]).P is encountered, add the components of x to the input 
set. 

— If an equality test [x is y] is encountered and the value v being expanded 
is X and the guard y is not in the input set, then replace the value v with y. 

— If a decryption test case x of {m}k in is encountered and the value v being 
expanded is x and {m}k is not in the input set, then replace v with {m}k ■ 

— If an output c{y).P is encountered in the subprocess and v is an element of 
the message y then find the corresponding c{x).Q in a receiving subprocess 
where the type signature of x matches the type signature of y. Replace v with 
its corresponding element in x and scan the subprocess Q until it reaches a 
null process or process instantiation. 

For example, consider the following process. 

c([w, ...]).a(w).Xi I 
a{id).[id is B : agent] Yi 

An agent identifier v received by a subprocess X on channel c is subsequently 
sent on channel a and received by another subprocess Y\ whose corresponding 
element is id. Then if a subsequent equality test is performed in subprocess 
Y between id and B we can replace v in the signature of the message to be 
received on channel c in the original subprocess X by the value B:Agent. Any 
message created by the spy that does not contain B: Agent as the first element of 
the message to be sent on channel c will cause the second subprocess to block. 
Special care needs to be taken if the value against which v is being tested has 
yet to be received from another subprocess since we cannot replace v because 
its value is not yet known. 

This is an extension of the ideas presented by Mitchell et al. in [9] where 
they propose that if a receiver of a message discards messages without a correct 
checksum then the adversary should avoid generating these useless messages. In 
this paper the subprocess that receives a message and any subprocess to which it 
subsequently communicates any element of the message is scanned to see if the 
values received in the message are tested against any specific value in equality 
tests or decryption tests. If the received values are tested then they must have 
specific values in order for the subprocesses to evolve. These specific values can 
be used to reduce the number of valid messages that the intruder can generate. 
This has a significant effect on reduce the size of the state space to be searched. 

3.5 The Algorithm 

Basic structure. The basic structure of the Spycatcher algorithm is given in 
figures 6 and 7. In these figures a record with elements itemA, itemB and itemC 
is represented as {itemA, itemB, itemC}. In figure 6 catchmll is a procedure 
that takes a protocol, a spy identifier, a knowledge set and a goal. The goal is 
expressed in terms of what information the spy has acquired and/or whether 
the spy can send a specific message on a specific channel. When the spy is a 
bona fide agent in the protocol, specifying the identifier of the spy enables all 
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the actions performed by the spy as part of the protocol are recorded as well as 
the “out of protocol” actions. “In-protocol” events are input and output events 
that a spy performs that are part of the normal behaviour of an honest agent 
in the protocol. “Out-of-protocol” events are all other events. The knowledge set 
specifies the initial knowledge of the spy. This includes all the public keys of the 
agents, its own private key(s), any symmetric keys initially known by the spy, 
the identifiers of the agents in the protocol and a set of nonces. The procedure 
catch-all calls the procedure multi-catch which recursively calls the procedure 
step-catch until all unpruned evolutions of the protocol have been checked. The 
step-catch prodecure (figure 7) takes an evolution from the list of evolutions 
to explore and checks whether it has previously processed this evolution or a 
related evolution. If this is the case then expanding the current evolution will 
not satisfy any unresolved elements of the goal and this evolution is discarded. 
If the evolution has not been seen before step-catch checks whether it is possible 
to satisify elements of the goal from this evolution. If satisfying elements of the 
goal might be possible from the current evolution step-catch generates a list of 
all possible evolutions from the current evolution due to input/output reductions 
(cf. figure 2). All reductions other than input/output reductions (cf. figure 2) are 
then performed on each evolution in this list. Duplicate evolutions are removed 
prior to inserting these new evolutions into the list of evolutions to explore. The 
current evolution is added to the set of previously processed evolutions. 

multi-catch returns a list of all the attacks on the protocol that satisfy the 
goal. These attacks will be in the form of the input and output events, both 
“in-protocol” and “out-of-protocol” , that the spy participated in during evolu- 
tion that resulted in the spy satisfying the goal. The minimisespies procedure 
scans the list of generated attacks and removes redundant actions and duplicate 
attacks. 

The algorithm maintains the list of evolutions yet to be explored and checks 
whether the current evolution being analysed has been previously processed. 
Specifically, it checks whether any previously encountered evolution, as stored 
in the SeenSet, had the same protocol state as the current evolution and if the 
knowledge set of the current evolution is a subset of the knowledge set of the 
previous evolution. If so, then expanding the current evolution will not satisfy 
any unresolved elements of the goal and this evolution is discarded. 

Spycatcher implements a conservative test to check if it is possible for an 
evolution to satisfy the goal from its current state. If any of the elements of the 
goal depend on knowledge that is not in the knowledge set of the spy or that 
is not part of an encrypted or unencrypted message, and no event exists in the 
current evolution of the protocol that outputs the element in the clear or as part 
of an encrypted or unencrypted message, then it is impossible for the spy to ever 
satisfy that element of the goal. 
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procedure multi^catch {{evolutions, seen}, spy Ad, goal) 

begin 

if evolutions = empty Aist 

then 

returu{empty Aist) 
else 

{proeess, spy , knows} = list_head{evolutions) 
if successful .attack {goal, knows, process ) 

then 

return{listAnsert{{spy, knows}, 

{multi.catch{{listAail{evolutions), seen}, spy Ad, goal)) 



)) 



else 

res = step.catch{{evolutions, seen}, spy Ad, goal) 
return{multi.catch{res, spy Ad, goal)) 

endif 

endif 

end 



procedure catch.all {process, spyAd, goal knows) 

begin 

retum{multi_catch{{list{{process, null, knows}), empty _set} , spyAd, goal)) 

end 



Fig. 6. The Spycatcher Algorithm 



procedure step .catch {{evolutions , seen}, spy.id, goal) 

begin 

if evolutions = empty Aist 

then 

returu{{evolutions , seen}) 
else 

{proeess, spy , knows} = list.head{evolutions) 
if element. of {{proeess , knows}, seen) 

then 

return{{list.tail{ evolutions ), seen) 
else 

return{{append{remove.duplicates 

{resolve.non.io 

{resolve. io{list.head{evolutions), spy.id))), 
list. tail ( evolutions)) } , 
add. to. set {{process, knows}, seen)) 

endif 

endif 

end 



Fig. 7. The Spycatcher Algorithm (continued) 
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4 Experimental Evaluation 

4.1 Needham-Schroeder Public Key Protocol 

In the Needham-Schroeder Public Key protocol[21] given in figure 3, the process 
of agent i that initiates the protocol with agent j, Send{i,j), and the process of 
agent i that responds to an initiation of the protocol by agent j, Recv{i,j), can 
be modeled in the spi calculus as: 

Send{i,j) = ).ji2{x). 

j 

case X of {n,nl}^- in [n is Ni] ij3({nl}^+ ).Send{i,j) 

^ j 

Recv{i,j) = {i'Ni)jil{x).case x of {n,id}j^+ in [id is j] ij2({n, ). 

ji3{y). case y of in Recv{i,j) 

An instantiation of the protocol between agents A and B in which both A 
and B can initiate a session can be modeled as: 

NS-AB = Send{A, B) \ Recv{A,B) \ Send{B, A) \ Recv{B, A) 



4.2 Otway-Rees Protocol 

Informally a common, though faulty, simplification of the Otway-Rees 
protocol [22], as presented in [23] [3], is given in figure 8. 



(1) A^B: [Na,A,B,{Na,A,B}K^, ] 

(2) B^S : [Na,A,B,{Na,A,B}K^, , Nt,, {Na, A, B}k^, ] 

(3) S^B ■. [Na,{Na,K,b}K^, ,{Nb,Kab}K,, ] 

{4)B^A: [Na,{Na,Kab}Kg, ] 



Fig. 8. Simplified Otway-Rees Protocol 

The process of agent i that uses the protocol to send the message to 
agent j, Send{i,j, Mij), and the process of agent i that receives a message from 
agent j, Recv{i,j), can be modeled in the spi calculus as: 

Send{i,j,M^j) = [yNj) 

ijl{[NiA,j,{Ni,i,j}Kis ])-J*4([n,?/c]Mn is iV*] 
case yc of {nl,k}Kis in [nl is Ni] ijb{{Mij}k ).0 

Recv{i,j) = [vNi) 

jil{[n,ji,ii,yc])dS2{[n,jiAi,yc,N^,{n,ji,ii}Kis ])• 

Si3{[n2,Xc,Zc]).case Zc of {ri3,k}Kis in [ns is Ni] 
ij4:{[n2, Xc]) ■ji5{mc) .case rric of {M}k in F{M) 

The server process Srv used when agent i wishes to send data to agent j can 
be modeled in the spi calculus as: 
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Srv{i,j) = jS2{[n, s,d,pc,ni,pci]).[s is i] case pc of {ri 2 , si, in 

[ri2 is n] [si is s] [d\ is d] [d is j] case pci of {ns, S2, d2}xjg in 
[ns is n] [s2 is s] [d 2 is d] 

{iyk)Sj3{[n,{n,k}K,s ,{ni,k}Kjs ])-Srv{i,j) 

An instantiation of the protocol between agents A and B in which both A 
and B can initiate a session can be modeled as: 

OR^AB A Send{A,B) \ Recv{A, B) \ Srv{A,B) \ Send{B,A) \ Recv{B,A) \ 
Srv{B,A) 



4.3 Results 

The Spycatcher algorithm has been implemented in Standard ML [25] and com- 
piled with MLton[26j. MLton is a whole program optimising compiler for the 
Standard ML language. The program was executed on a 500MHz. Pentium PC 
with 512Mb of memory under Linux. The Spycatcher algorithm was tested with 
three protocols. The Needham-Schroeder protocol and the Needham-Schroeder- 
Lowe protocol were used. For both protocols the system was defined with three 
agents A, B and C where agent C is the spy and Agent A can communicate with 
both agents B and C. Finally, the algorithm was tested with a flawed version of 
the Otway- Rees protocol where two honest agents A and B can initiate a session 
with each other via a trusted server S in th presence of an untrusted agent C 
(the spy). 



The following are the timings on the 3 protocols. 



Protocol 


Time to find first attack 


Time to find all attacks 


Needham-Schroeder 


0.988s 


2.25s 


Needham-Schroeder-Lowe 


3.435st 


3.435st 


Otway-Rees 


1.1519s 


4702s 



^ Since the Needham-Schroeder-Lowe protocol has no attacks, this time represents 
the time to complete the search for all possible attacks and discover that none 
exist. 



The Spycatcher algorithm found the following attack on the Needham- 
Schroeder protocol. 

acl((iV,,A}^+ ).^({iV„A}^+ ).ba2{{Na,N^}K+ )• 
ca2{{Na,Nb}j^+ ).ac3({A^t,}^+ ).0 

This is essentially Lowe’s attack [27]. Once an honest agent tries to initi- 
ate a session with a dishonest agent then the dishonest agent can impersonate 
the honest agent to another targeted agent and use the honest agent as an 
oracle for the response of the targeted agent. The protocol can be corrected 
by simply adding the agent identifier of the receiver to the second message. 
The Spycatcher algorithm could not construct a successful attack against this 
Needham-Schroeder-Lowe protocol. 

The Spycatcher algorithm constructed 30 different attacks against the sim- 
plified Otway- Rees protocol in figure 8. These different attacks belong to two 





Synthesising Attacks on Cryptographic Protocols 



61 



closely related groups of attacks. The basic structure of the first group of at- 
tacks is given in figure 9. In figures 9 and 10 the notation 1-x and 2-x in the 
first column represent the xt/i step in the first and second parallel session re- 
spectively, and the notation y-x/ represents a repetition on the xth step of the 
j/th parallel session (with different participants in different roles). The notation 
Xy represents agent X masquerading as agent Y. In the first group of attacks 
the spy initiates a parallel session with an honest agent after the two honest 
agents have initiated a session between them (as in figure 9). In the second 
group of attacks (given in figure 10) the spy initiates a session with an honest 
agent and then waits until the honest agent initiates another session with some 
other honest agent. This attack, which relies on the parties participating in the 
protocol not aborting after a specified time period, is generally not reported 
in the literature but is found by Spycatcher. The reason for the larger number 
of constructed attacks is that the recipient of the first message of the protocol 
cannot decrypt the fourth component of the message since it is encrypted with 
a key it doesn’t know. Therefore the spy can put any nonce or agent identifiers 
it knows into this message as long as the message it relays to the server contains 
sensible information in the corresponding fields. 



(1-1) 


A^Cb 


[Na,A,B,{Na,A,B}K, 


] 






(2-1) 


C ^ A 


[N,,C,A,{N,,C,A}k,, 


1 






(2-2) 


A^Cs 


[N,,C,A,{N,,C,A}k,^ 


,K 


,{Nc, 


C,A}Kas 1 


(2-2’) 


Ca^S 


[N,,C,A,{N,,C,A}k,, 


,Na 


,{Nc, 


C,A}Kas ] 


(2-3) 


S^Ca 


[N,ANc,K,a}K,, ,{Na 




}Ka, 




(1-4) 


Cb A 


[Na,{Na,K,a}K^^ ] 










Fig. 9. Basic Attack on Simplified 


Otway-Rees Protocol 



(1-1) 




A 


[N, 


C,A,{Na,C,A}Kas 


1 




(1-2) 




Cs 


[iVc 


C,A,{Na,C,A}Kas 




,{Na,C,A}Kas 1 


(2-1) 




B 


[Na 


A,B,{Na,A,B}Kas ] 




(2-2) 


B -A- 


S 


[Na 


A,B,{Na,A,B}Ka 


,N 


'„{Na,A,B}K,B ] 


(2-3) 


S 


B 


[Na 


{Na, Kab}Kas ,{Nb 


,Ka 


] 


(2-4) 


B -A- 


Ca 


[Na 


[Na,Kab}Kas ] 






(1-2’) 


Ca^S 


[Na 


C,A,{Na,C,A}Kas 


,Na 


,{Na,C,A}Kas 1 


(1-3) 


S 


Ca 


[Na 


[Na,Kaa}Kas , [Na 


,Kaa 


] 


(2-4’) 


Cb ^ 


[Na 


{Na,Kaa}KaB 1 







Fig. 10. Alternative Attack on Simplified Otway-Rees Protocol 
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5 Conclusions 

This paper presents the Spycatcher Algorithm which is designed to synthesise 
attackers in protocols specified in a variant of the spi calculus. This algorithm 
synthesises transparent Dolev-Yao attackers who interact with the agents in the 
system in order to discover some set of secret information. Restricting the syntax 
used to specify the protocol allows the algorithm to manage the growth of the 
state space. The heuristics used to manage the growth of the state space rely on 
the use of strict typing and on only allowing protocol specifications that do not 
deadlock (non-blocking evolutions). The algorithm has been implemented and 
tested using the Needham-Schroeder and Otway-Rees protocols. The algorithm 
can quickly find an attack on these protocols. Finding all the attacks on the 
Otway-Rees protocol takes approximately 78 minutes on a low to average spec- 
ification machine. This attack requires parallel sessions of the protocol which 
leads to a very large state space. Spycatcher can be used to quickly search for 
a single attack on a protocol. Knowledge of the structure of such an attack will 
enable the protocol designers to repair the protocol. 

Future work on the algorithm will remove the current constraint that forces 
keys to be atomic terms. Allowing keys to be non-atomic will expand the type 
of protocols that can be analyse. In addition, since the Spycatcher algorithm 
requires a definition of the initial knowledge of a spy prior to constructing an 
attack, another line of research will focus on determining a minimal initial knowl- 
edge set required for given types of attacks. Initially we are investigate the use 
of weakest precondition analysis using the spi calculus. 
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Abstract. The complementation problem for nondeterministic word au- 
tomata has numerous applications in formal verification. In particular, 
the language-containment problem, to which many verification problems 
is reduced, involves complementation. For automata on finite words, 
which correspond to safety properties, complementation involves deter- 
minization. The 2" blow-up that is caused by the subset construction is 
justified by a tight lower bound. For Biichi automata on infinite words, 
which are required for the modeling of liveness properties, optimal com- 
plementation constructions are quite complicated, as the subset construc- 
tion is not sufficient. From a theoretical point of view, the problem is con- 
sidered solved since 1988, when Safra came up with a determinization 
construction for Biichi automata, leading to a complementation 

construction, and Michel came up with a matching lower bound. A care- 
ful analysis, however, of the exact blow-up in Safra’s and Michel’s bounds 
reveals an exponential gap in the constants hiding in the 0 {) notations: 
while the upper bound on the number of states in Safra’s complemen- 
tary automaton is Michel’s lower bound involves only an n! blow up, 
which is roughly (n/e)". The exponential gap exists also in more recent 
complementation constructions. In particular, the upper bound on the 
number of states in the complementation construction in [KVOl], which 
avoids determinization, is (Gn)". This is in contrast with the case of au- 
tomata on finite words, where the upper and lower bounds coincides. 

In this work we describe an improved complementation construction for 
nondeterministic Biichi automata and analyze its complexity. We show 
that the new construction results in an automaton with at most (l.OGn)" 
states. While this leaves the problem about the exact blow up open, the 
gap is now exponentially smaller. From a practical point of view, our 
solution enjoys the simplicity of [KVOl], and results in much smaller 
automata. 
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1 Introduction 



The complementation problem for nondeterministic word automata has numer- 
ous applications in formal verification. In order to check that the language of 
an automaton Ai is contained in the language of a second automaton A 2 , one 
checks that the intersection of Ai with an automaton that complements A 2 is 
empty. Many problems in verification and design are reduced to language con- 
tainment. In model checking, the automaton Ai corresponds to the system, and 
the automaton A 2 corresponds to the property we wish to verify [Kur94,VW94]. 
While it is easy to complement properties given in terms of formulas in temporal 
logic, complementation of properties given in terms of automata is not simple. 
Indeed, a word w is rejected by a nondeterministic automaton A if all the runs of 
A on w rejects the word. Thus, the complementary automaton has to consider 
all possible runs, and complementation has the flavor of determinization. For 
automata on finite words, determinization, and hence also complementation, is 
done via the subset construction [RS59]. Accordingly, if we start with a nonde- 
terministic automaton with n states, the complementary automaton may have 
2" states. The exponential blow-up that is caused by the subset construction is 
justified by a tight lower bound: it is proved in [SS78] that for every n > 1, there 
exists a language L„ that is recognized by a nondeterministic automaton with n 
states, yet a nondeterministic automaton for the complement of L„ has at least 
2" states (see also [Bir93], for a similar result, in which the languages are 
over an alphabet of size 4). 

For Biichi automata on infinite words, which are required for the modeling 
of liveness properties, optimal complementation constructions are quite compli- 
cated, as the subset construction is not sufficient. Due to the lack of a simple 
complementation construction, the user is typically required to specify the prop- 
erty by a deterministic Biichi automaton [Kur94] (it is easy to complement a 
deterministic Biichi automaton), or to supply the automaton for the negation 
of the property [Hol97]. Similarly, specification formalisms like ETL [Wol83], 
which have automata within the logic, involve complementation of automata, and 
the difficulty of complementing Biichi automata is an obstacle to practical use 
[AFF+02]. In fact, even when the properties are specified in LTL, complementa- 
tion is useful: the translators from LTL into automata have reached a remarkable 
level of sophistication (c.f., [DGV99,SB00,GO01,GBS02]). Even though comple- 
mentation of the automata is not explicitly required, the translations are so 
involved that it is useful to checks their correctness, which involves comple- 
mentation^. Gomplementation is interesting in practice also because it enables 
refinement and optimization techniques that are based on language containment 



^ For an LTL formula -!/>, one typically checks that both the intersection of with 
A-^i), and the intersection of their complementary automata are empty. 
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rather than simulation^. Thus, an effective algorithm for the complementation of 
Biichi automata would be of significant practical value. Efforts to develop simple 
complementation constructions for nondeterministic automata started early in 
the 60s, motivated by decision problems of second-order logics. Biichi suggested 
a complementation construction for nondeterministic Biichi automata that in- 
volved a complicated combinatorial argument and a doubly-exponential blow-up 
in the state space [Biic62]. Thus, complementing an automaton with n states re- 
sulted in an automaton with " states. In [SVW87], Sistla et al. suggested 
an improved construction, with only 2*^^" ^ states, which is still, however, not 
optimal. Only in [Saf88], Safra introduced a determinization construction, which 
also enabled a complementation construction, matching a lower bound 

described by Michel [Mic88]. Thus, from a theoretical point of view, the problem 
is considered solved since 1988. A careful analysis, however, of the exact blow- 
up in Safra’s and Michel’s bounds reveals an exponential gap in the constants 
hiding in the 0() notations: while the upper bound on the number of states 
in the complementary automaton constructed by Safra is n^", Michel’s lower 
bound involves only an n! blow up, which is roughly (nle)^. The exponential 
gap exists also in more recent complementation constructions. In particular, the 
upper bound on the number of states in the complementation construction in 
[KVOl], which avoids determinization, is (6n)". This is in contrast with the case 
of automata on finite words, where, as mentioned above, the upper and lower 
bounds coincide. 

In this work we describe an improved complementation construction for non- 
deterministic Biichi automata and analyze its complexity. The construction is 
based on new observations on runs of nondeterministic Biichi automata: a run 
of a nondeterministic Biichi automaton A is accepting if it visits the set a of 
accepting states infinitely often. Accordingly, A rejects a word w if every run of 
A visits a only finitely often. The runs of A can be arranged in a DAG (directed 
acyclic graph). It is shown in [KVOI] that A rejects w iff it is possible to label 
the vertices of the DAG by ranks in 0, . . . , 2n so that some local conditions on 
the ranks of vertices and their successors are met. Intuitively, as in the progress 
measure of [Kla91], the ranks measure the distance to a position from which no 
states in a are visited. We show here that the ranks that label vertices of the 
same level in the DAG have an additional property: starting from some limit level 
him, > 0, if a vertex in level I > him is labeled by an odd rank j, then all the odd 
ranks in 1, . . . , j label vertices in level 1. It follows that the complementary au- 
tomaton, which considers all the possible level rankings (i.e., ranks that vertices 
of some level in the DAG are labeled with), may restrict attention to a special 

^ Since complementation of Biichi automata is complicated, current research is focused 
on ways in which fair simulation can approximate language containment [HKR02], 
and ways in which the complementation construction can be circumvented by man- 
ually bridging the gap between fair simulation and language containment [KPP03]. 
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class of level rankings. Using some estimates on the asymptotics of Stirling num- 
bers of the first kind we are able to bound the size of this class and describe 
a complementation construction with only (3cn)" states, for c < 0.63. We then 
tighten the analysis further and show that our complementary automaton has 
at most (1.06n)" states. While this leaves the problem about the exact blow 
up that complementation involves open, the gap is now exponentially smaller: 
instead of an upper bound of (6n)” states, we now have at most (1.06n)” states. 
From a practical point of view, our solution enjoys the simplicity of [KVOl], 
and results in much smaller automata. Moreover, the optimization constructions 
described in [GKSV03] for the construction of [KVOl] can be applied also in our 
new construction, leading in practice to further reduction of the state space. 

2 Preliminaries 

Given an alphabet S, an infinite word over S is an infinite sequence w = - a\- 

(72 • • • of letters in E. An automaton on infinite words is A = (A, Q, Qm, P, ce), 
where E is the input alphabet, Q is a finite set of states, p : Q x A — >■ 2*^ is 
a transition function, Qi„ C Q is a set of initial states, and a is an acceptance 
condition (a condition that defines a subset of Q“). Intuitively, p{q,cr) is the 
set of states that A can move into when it is in state q and it reads the letter 
(7. Since the transition function of A may specify many possible transitions for 
each state and letter, A is not deterministic. If \Qin\ = 1 and p is such that for 
every q € Q and cr G A, we have that \p{q,a)\ = 1, then A is a deterministic 
automaton. 

A run of A on w is a function r : IN — >• Q where r(0) G Qin (i.e., the run starts 
in an initial state) and for every I > 0, we have r{l -I- 1) G p{r{l),ai) (i.e., the 
run obeys the transition function). In automata over finite words, acceptance 
is defined according to the last state visited by the run. When the words are 
infinite, there is no such thing “last state” , and acceptance is defined according 
to the set Inf{r) of states that r visits infinitely often, i.e., 

Inf{r) = {q & Q : for infinitely many I G IN, we have r{l) = q}. 

As Q is finite, it is guaranteed that Inf{r) yf 0. The way we refer to Inf{r) 
depends on the acceptance condition of A. In Biichi automata, a Q Q, and r 
is accepting iff Inf(r) fl a yf 0. Dually, in co-Biichi automata, a C Q, and r is 
accepting iff Inf{r) (la = ij). 

Since A is not deterministic, it may have many runs on w. In contrast, a 
deterministic automaton has a single run on w. There are two dual ways in which 
we can refer to the many runs. When A is an existential automaton (or simply 
a nondeterministic automaton, as we shall call it in the sequel), it accepts an 
input word w iff there exists an accepting run of A on ic. When A is a universal 
automaton, it accepts an input word w iff all the runs of A on w are accepting. 
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We use three-letter acronyms to describe types of automata. The first letter 
describes the transition structure and is one of “D” (deterministic), “N” (non- 
deterministic), and “U” (universal). The second letter describes the acceptance 
condition; in this paper we only consider “B” (Biichi) and “C” (co-Biichi). The 
third letter describes the objects on which the automata run; in this paper we are 
only concerned with “W” (infinite words). Thus, for example, NBW designates 
a nondeterministic Biichi word automaton and UCW designates a universal co- 
Biichi word automaton. 

In [KVOl], we suggested the following approach for NBW complementation: 
in order to complement an NBW, first dualize the transition function and the 
acceptance condition, and then translate the resulting UCW automaton back 
to a nondeterministic one. By [MS87], the dual automaton accepts the comple- 
mentary language, and so does the nondeterministic automaton we end up with. 
Thus, rather than determinization, complementation is based on a translation 
of universal automata to nondeterministic ones, which turned out to be much 
simpler. 

Consider a UCW A = (U, Q, Qi„, S, a). The runs of ^ on a word w = (Jq ■ 
(Ji • • • can be arranged in an infinite DAG (directed acyclic graph) Qr = (V,E), 
where 

— U C Q X IM is such that {q, 1) £ V iS some run of xl on ic has r{l) = q. For 
example, the first level of Gr contains the vertices Qm x {0}. 

-EC U;>q(Q X {?}) X (Q X {^-1-1}) is such that E{{q, 1), (q',1 + 1)) iff {q, 1) £ V 
and q' £ S(q, oi). 

Thus, Gr embodies exactly all the runs of xl on re. We call Gr the run DAG of 
A on w, and we say that Gr is accepting if all its paths satisfy the acceptance 
condition a. Note that A accepts w iff Gr is accepting. We say that a vertex 
{q',l') is a successor of a vertex (q,l) iff E{{q,l) , {q' ,1')). We say that {q',l') 
is reachable from {q,l) iff there exists a sequence {qo,lo), {qi,h), (q 2 ,l 2 ), ■ ■ ■ of 
successive vertices such that (q,l) = (qoJo), and there exists i > 0 such that 
{q', I') = {qi, li). For a set S' C Q, we say that a vertex {q, 1) of Gr is an S-vertex 
a q £ S. 

Consider a (possibly finite) DAG G C Gr- We say that a vertex {q,l) is finite 
in G if only finitely many vertices in G are reachable from {q, 1). For a set S C Q, 
we say that a vertex {q, 1) is S-free in G if all the vertices in G that are reachable 
from (g, 1) are not S- vertices. Note that, in particular, an S-free vertex is not an 
S-vertex. We say that a level / of t/ is of width d > 0 if there are d vertices of 
the form (g, 1) in G- Finally, the width of G is the maximal d> 0 such that there 
are infinitely many levels I of width d. The a-less width of a level of G is defined 
similarly, restricted to vertices (g, 1) for which q ^ a. Note that the width of Gr 
is at most n and the a-less width of Gr is at most n— |a|. 

Runs of UCW were studied in [KVOl]. For a; G IM, let [x] denote the set 
{0,l,...,a;}, and let and [x]®'"®" denote the set of odd and even members 
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of [x], respectively. A co-Biichi-ranking for Or {C-ranking , for short) is a function 
/ : y — >■ [2n] that satisfies the following two conditions: 

1. For all vertices (q,l) G V, if /((<?, 0) is odd, then q ^ a. 

2. For all edges {{q,l) , {q' ,l + 1)) G E, we have f{{q',l+ 1)) < f{{q,l))- 

Thus, a C-ranking associates with each vertex in Or a rank in [2n] so that the 
ranks along paths do not increase, and a- vertices get only even ranks. We say 
that a vertex (g, 1) is an odd vertex if /((<?, 1)) is odd. Note that each path in Or 
eventually gets trapped in some rank. We say that the C-ranking / is an odd 
C-ranking if all the paths of Or eventually get trapped in an odd rank. Formally, 
/ is odd iff for all paths {q^, 0), (gi, 1), (92, 2 ), . . . in Or, there is ^ > 0 such that 
f{{qi,l)) is odd, and for all I' > I, we have f{{qi’,l')) = f{{qi,l)). Note that, 
equivalently, / is odd if every path of Or has infinitely many odd vertices. 

Lemma 1 . [KVOl] The following are equivalent. 

1 . All the paths of Or have only finitely many a-vertices. 

2. There is an odd C-ranking for Or- 



Proof: Assume first that there is an odd C-ranking for Or- Then, every path 
in Or eventually gets trapped in an odd rank. Hence, as a-vertices get only even 
ranks, all the paths of Or visit a only finitely often, and we are done. 

For the other direction, given an accepting run DAG Or, we define an infinite 
sequence Oo 12 Oi O 2 - - - oi dags inductively as follows. 

- Oo = Or- 

- Q2i+1 = Q2i \ {(<?, 1) I (<?, 0 is finite in O21}- 

- 02i+2 = 02i+i \ {{q, 1) I {q, 1) is a-free in 02%+i}- 

It is shown in [KVOl] that for every i > 0, the transition from 02i+i to f/2i+2 
involves the removal of an infinite path from 02i+i- Since the width of Oo is 
bounded by n, it follows that the width of 02i is at most n — i. Hence, 02n is 
finite, and 02n+i is empty. In fact, as argued in [GKSV03], the a-less width of 
t/2i is at most n — (jaj -I- i), implying that 02(n-\a\)+i is already empty. Since 
|a| > 1, we can therefore assume that 02 n-i is empty. 

Each vertex {q, 1) in Or has a unique index i > 1 such that {q, 1) is either 
finite in 02i or a-free in t/2i+i- Thus, the sequence of dags induces a function 
rank : V — >■ [2n — 2], defined as follows. 



rank{q, 1) 



2i If {q, 1) is finite in 02i- 
2i -I- 1 If {q, 1) is a-free in I/2i+i- 



It is shown in [KVOl] that the function rank is an odd C-ranking. 



□ 



We now use C-ranking in order to translate UCW to NEW: 
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Theorem 1. [KV01,GKSV03] Let A he aUCW withn states. There is anNBW 
A' with at most 3" • (2n — 1)" states such that C{A') = C{A). 

Proof: Let A = {S, Q, Qm, 6, a). When A' reads a word w, it guesses an odd 
C-ranking for the run DAG Qr oi A on w. At & given point of a run of A! , it 
keeps in its memory a whole level of Qr and a guess for the rank of the vertices 
at this level. In order to make sure that all the paths of Qr visit infinitely many 
odd vertices, A' remembers the set of states that owe a visit to an odd vertex. 

Before we define A' , we need some notations. A level ranking for A is a 
function g : Q ^ [2n — 2], such that if g{q) is odd, then q ^ a. Let TZ be the set 
of all level rankings. For a subset S' of Q and a letter a, let 6{S, a) = Uses ■ 

Note that if level I in Qr, for I > 0, contains the states in S, and the {I + l)-th 
letter in w is cr, then level I + 1 of Qr contains the states in 6{S, a). 

For two level rankings g and g' in TZ, a set S C Q, and a letter a, we say that 
g' covers (g,S,a) if for all g G S and q' € S(q,a), we have g'(q') < g{q). Thus, 
if the vertices of level I contain exactly all the states in S, g describes the ranks 
of these vertices, and the {I + l)-th letter in w is cr, then g' is a possible level 
ranking for level ^ + 1. Finally, for g € TZ, let odd{g) = {q : g{q) G [2n — 2]°'^'^}. 
Thus, a state of Q is in odd{g) if has an odd rank. 

Now, A' = {S, Q' , QU: Oi'), where 

— Q' = 2^ X 2^ X TZ, where a state (S,0,g) G Q' indicates that the current 
level of the dag contains the states in S, the set O Q S contains states along 
paths that have not visited an odd vertex since the last time O has been 
empty, and g is the guessed level ranking for the current level. 

- = {Q^u} X {0} X TZ. 

— S' is defined, for all {S, 0,g) G Q' and a £ S, as follows. 

• If O yf 0, then 

S'{{S,0,g),a) = {{6{S,a),S{0,a) \ odd{g'),g') : g' covers (g,S,a)}. 

• If O = 0, then 

6'{{S,0,g),a) = {{6{S,(j),d{S,(j) \ odd{g'),g') : g' covers (g,S,a)}. 

- a' = 2^ X {0} X TZ. 

Consider a state (S,0,g) G Q' . Since O C S, there are at most 3” pairs S and 
O that can be members of the same state. In addition, since there are at most 
(2n — I)" level rankings, the number of states in A' is at most 3” • (2n — 1)”. □ 

Note that since lim„-,.oo(l + the 3" • (2n— 1)” bound in Theorem 1 

is equal to (tSn)’^ j yje. (Since yje is a constant, we sometimes refer to the bound 
as (6n)”.) 

Corollary 1. Let A be an NEW with n states. There is an NEW A with at 
most {Qn)'^ I ^/e states such that C{A) = \ C{A). 
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3 An Improved Upper Bound 

In this section we show how the 3" • (2n — 1)" bound described in Section 2 can 
be improved. 

Consider a UCW A and a word w G 27“ accepted by A. For the run r of A 
on w, let max -rank{r) be the maximal rank that a vertex in Qr gets. 

Lemma 2. There is a level I > 0 such that for each level V > I, and for all 
ranks j G [max-rank{r)]°'^‘^ , there is a vertex {q,l') such that rank{q,l') = j. 

Proof: Let k be the minimal index for which Q 2 k is finite. For every 0 < 
i < k — 1, the DAG t/ 2 i+i contains an a-free vertex. Let li be the minimal 
level such that G 2 i+i has an a-free vertex {q,li). Since {q,li) is in G 2 i+i, it 
is not finite in G 2 i- Thus, there are infinitely many vertices in G 21 that are 
reachable from {q,li). Hence, by Konig’s Lemma, G 2 i contains an infinite path 
(g, li), (qi,li + l), (q 2 ,li + 2), . . .. For all j > 1, the vertex {qj, k + j) has infinitely 
many vertices reachable from it in G 2 i and thus, it is not finite in G 2 i- Therefore, 
the path {q, k), {qi,k + 1) , {q 2 ,k + 2) , . . . exists also in G 2 i+i. Recall that {q, k) 
is a-free. Hence, being reachable from {q,k), all the vertices {qj,k + j) in the 
path are a-free as well. It follows that for every 0 < i < A: — 1 there exists a level 
U such that for all V > U, there is a vertex {q,l') that is a-free in t/ 2 i+i, and 
for which rank{q, V) would therefore be 2t -|- 1. Since the maximal odd member 
in [max -rank{r)]°'^'^ is 2k — 1, taking I = maxo<i<fc_i{/i} satisfies the lemma’s 
requirements. □ 

Recall that a level ranking for A is a function g : Q ^ [2n — 2], such that 
if g{q) is odd, then q ^ a. Let max-odd(g) be the maximal odd number in the 
range of g. 

Definition 1. We say that a level ranking g is tight if 

1. the maximal rank in the range of g is odd, and 

2. for all j G [max -odd{g)]°‘^‘^ , there is a state q G Q with g{q) = j. 

Lemma 3. There is a level I > 0 such that for each level I' > I, the level ranking 
that corresponds to I' is tight. 

Proof: Lemma 2 implies that for all the levels /' beyond some level ^i, the level 
ranking that corresponds to V satisfies the second condition in Definition 1. Let 
g be the level ranking in level li. Since even ranks label finite vertices, only a 
finite number of levels I' > l\ have even ranks greater than max-odd(g) in their 
range. The level I required in the lemma is then the level beyond l± in which 
these even ranks “evaporate”. □ 
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We refer to the minimal level I that satisfies the conditions in Lemma 3 as 
the limit level of r, denoted limit{r). 

It follows that we can improve the construction described in the proof of 
Theorem 1 and restrict the set TZ of possible level rankings to the set of tight 
level rankings. Since, however, the tightness of the level ranking is guaranteed 
only beyond the limit level of r, we also need to guess this level, and proceed 
with the usual subset construction until we reach it. Formally, we suggest the 
following modified construction. 

Theorem 2. Let A he a UCW with n states. Let tight{n) he the number of tight 
level rankings. There is an NEW A' with at most 3" • tight (n) states such that 
C{A') = C{A). 

Proof: Let A = {S, Q, Qm, 5, a), and let TZught be the set of tight level rankings 
for A. Then, A' = (U, Q' , Q'„, 6', a'), where 

— Q' = 2^ D {2^ X 2*^ X TZtight), where a state S G Q' indicates that the current 
level of the dag contains the states in S (and is relevant for levels before 
the limit of r), and a state (S,0,g) G Q' is similar to the states in the 
construction in the proof of Theorem 1 (and is relevant for levels beyond the 
limit of r). In particular, O Q S. 

^ Q'in = {Qin}- Thus, the run starts in a “subset mode”, corresponding to a 
guess that the limit level has not been reached yet. 

— For all states in Q' of the form S G 2'^ and a G S, we have that 

S'(S,a) = {S(S,a)j U {(S(S, a), O, g) : O C 5{S,a) and g G Utight}- 

Thus, at each point in the subset mode. A' may guess that the current level 
is the limit level, and move to a “subset+ranks” mode, where it proceeds 
as the NEW constructed in the proof of Theorem 1. Thus, for states of the 
form (S,0,g), the transition function is as described there, only that rank 
levels are restricted to tight ones. 

— a' = 2*^ X {0} X TZtight ■ Thus, as in the proof of Theorem 1, A' is required to 
visit infinitely many states in which the O component is empty. In particular, 
this force A! to eventually switch from the subset mode to the subset+ranks 
mode. 

We prove that C{A') = C{A). In order to prove that C{A) C C{A'), we prove 
that C{A") C L{A'), for the NEW A” constructed in Theorem 1, and for which 
we know that C{A") = C{A). Consider a word w G 27“ accepted by A" . Let r" 
be the accepting run of A" on w. Ey Lemma 3, the point limit {r) exists, and 
all the level rankings beyond limit{r) are tight. Therefore, the run r' obtained 
from r" by projecting the states corresponding to levels up to limit (r) on their 
S component is a legal and accepting run of A' on w. 
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It is left to prove that C{A') C C{A). Consider a word w G accepted by 
A' ■ Let Qr be the run DAG of ^ on m. We prove that there is an odd C-ranking 
f : V ^ [2n] for Qj.. Then, by Lemma 1, A accepts w. Let r' be the accepting 
run of A' on w. By the definition of 6, the projection of the states of r' on the 
first S component corresponds to the structure of Gr- Since the initial state of 
A' is {Qin} whereas a' contains only states in 2^ x 2^ x TZught, there must be 
a level I in which r' switches from a state mode to a state+ranks mode, and 
from which, according to the definition of 6' , it stayed forever in that mode. The 
tight level rankings that r' describes for levels beyond I induce the C-ranking 
for vertices in these levels. For levels I' < I, we can define f{{q, I')) = 2n for all 
q € Q. Note that the ranks of all vertices in levels up to I is even, and / does not 
increase up to this point. In addition, since the maximal element in the range of 
a level ranking g G TZught is at most 2n — 1, the ranking / does not increase also 
in level 1. Finally, since each path eventually reaches the point I, from which the 
level rankings induce an odd C-ranking, / is odd. □ 



Corollary 2. Let A be an NEW with n states. There is an NEW A with at 
most 3" • tight{n) states such that \ C{A). 



While the improved construction involves an extra copy of 2^ in the state 
space, the restriction to tight rank assignments is significant. Indeed, as we show 
now, tighten) is bounded by (cn)'^, for c < 0.63, which gives a {3cn)"‘ bound, for 
c < 0.63, on the number of states of the complementary automaton. 

The general idea is roughly as follows. Recall that we wish to bound from 
above the number of tight level rankings - functions / : Q — >■ [2n — 2] that are 
onto [21 — with 21 — 1 being the maximal number in the range of /. As a 

first step we need a bound on the number of functions from the set {1, . . . , n} 
onto the set {!,... ,m}. This is nothing else but ml times the Stirling number 
of the first kind, s{n,m). The asymptotics of these numbers is known, e.g. in 
[Tem93], where the following is implicit. 



Lemma 4 (Temme). The number of functions from {!,..., n} onto 
{!,..., Pn} is at most 

[(l + o(l))(M[/?]n)]", 

where 

Now, let 



p{£, n) = max 

r 




r -\- 1 
n 





(*) 



where the maximum is over all r < ^, n — £. The value of p{£, n) depends only 
on the ratio ^ . To see this, note that if we allow r to assume a real value rather 
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than only integer values, we still get an upper bound. Thus, we can assume that 
r = an and £ = "fn for some a and 7 . Then, all the terms in the bound are 
functions of a and 7 , where we are maximizing on a. Therfore, the bound we 
get is only a function of 7 = ^. Let h{^) = p{£, n). Then: 

Theorem 3. The number of functions from {1, . . . , n} to {0 , . . . ,2£ — 1} that 
are onto the I odds is no more than n [(1 + o(l))/i(£)n] . 

Proof: Fixing r, one chooses which r evens are going to be hit ((^) possibilities) 
and then chooses a function from { 1 , . . . , n} onto the set of the I odds union with 
the set of the r chosen evens. Clearly, the number of such functions is equal to 
the number of functions from {1, . . . ,n} onto {1, . . . , f + r}. By Lemma 4 and 
Stirling’s approximation we get the expression that appears in (*). Choosing the 
“worst” r gives us the upper bound. □ 

Recall that a tight level ranking is a function g : Q ^ {0, . . . , 2n — 2} such 
that the maximal rank in the range of g is some odd 2£ — 1, and g is onto the 
£ odds {l,3,...,2f — 1}. Thus, the expression in Theorem 3 counts the number 
of tight level rankings as a function of - . 

In Figure 1 we describe the behavior of h{^) for 0 < ^ < 1, as plotted by 
Matlab. A simple analysis shows that the maximal value that h{-^) can get is 
0.63, for ^ « 0.5n, implying an upper bound of (0.63n)" to the number of tight 
level rankings. 



0.2 0.3 0.4 0.5 0.6 0.7 

l/n 



0.8 0.9 



Fig. 1. The function h{£), for 0 < < 1. 
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4 A Tighter Analysis 

In Section 3, we described an improved complementation construction and 
showed that the state space of the complementary automaton is bounded by 
(3cn)”, for c < 0.63. Our analysis was based on the observation that the state 
space of the complementary automaton consists of triples {S, O, g) in which S 
and O are subsets of the state space of the original automaton, with O C S, and 
g is a tight level ranking. Accordingly, we multiplied 3” - the number of pairs 
{S, O) as above with tight{n) - the number of tight level rankings. This analysis, 
while significantly improving the known j ^/e upper bound, ignores possible 

relations between the pair {S, O) and the tight level ranking g associated with 
it. In this section we point to such relations and show that, indeed, they lead to 
a tighter analysis. 

Consider a state (S,0,g) of the NBW A' constructed in Theorem 2. Note 
that while g : Q ^ [2n — 2] has Q as its domain, we can, given S, restrict 
attention to level rankings in which all states not in S are mapped to 0. To see 
this, note that the requirement about g being a tight level ranking stays valid 
for g with g{s) = 0 for all g ^ S. Also, the definition of when a level ranking 
covers another level ranking is parametrized with S. In addition, as O maintains 
the set of states that have not been visited an odd vertex, g maps all the states 
in O to an even rank. 

Let tighter{n) be the number of triples {S, O, /), with O C S' C {1, . . . , n}, 
such that there exists i so that / :S— >-{l,...,2£+l}is onto the odds and f{x) 
is even for x G O. By the above discussion, we have the following. 

Corollary 3. Let A be an NBW with n states. There is an NBW A with at 
most tighter{n) states such that C{A) = \£(A). 

We now calculate tighter {n) and conclude that the more careful analysis is 
significant - the bound on the state space that follows from Corollary 3 is better 
than the one that follows from Corollary 2. 

For a triple (S, O, /) as above, let T C S be the inverse image of the odd 
integers under /, and let O' = S \ T. Let a, j3, and 7 be such that |S| = an, 
\T\ = (3n, and £ = yn. Also, let >F„(a, /3, 7 ) denote the number of triples (S, O, f) 
for a fixed triple (a, P, 7 ). We are interested in ^ ^ !F„(a, /3, 7 ), where 0 < 7 < 
P < a < 1, and all three numbers are integer multiples of 1/n. Clearly, this is at 
most maxa,i 3 ,~fTn{a, P,^). Let us therefore compute for a fixed (a,/?, 7 ). 

In order to count, we start by choosing S, then choose T, next we choose the 
value of £, then define /, and finally choose O C O' . 

The number of ways to choose S is (^) which, using Stirling’s factorial 
approximation formula, is 



[(l + o(l))a-“(l-ar-T- 
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Fig. 2. Contour lines of 7 ) 



Note that in the above calculation we should use the convention 0° = 1. The 
number of ways to choose T inside S is , which is approximately 

a 

The number of ways to choose the values of / on T, according to Lemma 4, is 

(M[7//3]/3n)'5". 

The number of ways to choose the values of / for the elements of O' is I, 
which is 

The number of choices for O is 

2lo'l = 2^““^^". 

Using the notation /3, 7) and multiplying all of the above, 

we get ^„(a,/3,7) = 

((1 + o(l))a-“(l - a)“-i)((/3/a)-^/“(l - ^)^-i)“(M[7//3]/3)^(27)“-^n“. 

a 

For fixed values of (3 and 7, the asymptotic maximum of the above is achieved 
for a = 1. It see this, recall that a < 1 and note that all the terms in tpn{ct, f3, 7), 
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except for are bounded (as n goes to infinity). Therefore, for any a < 1, we 
have that ipniot, behaves like 0{n°^), which is smaller than n^, which is the 
order of magnitude when a = 1. Setting a = 1, we get 

max = (X + o(l))/3"^(l - /3)^"^(M[7//3]/3)^(27)^"^. 

n 

Since -(/n^ — >■ 1 this is also the asymptotic value of ^ /3,7 P, 7). 

In Figure 2 we describe the behavior of h'{P, 7) for 0 < 7 < /3 < 1, as plotted 
by Matlab (in the figure, (3 and 7 are indicated by b and c, respectively). A 
simple analysis shows that the maximal value that /i'(/3, 7) can get is 1.0512, for 
j3 « 0.6590 and 7 « 0.4880, implying an upper bound of (1.06n)” to tighter{n). 

Acknowledgment. We thank Raz Kupferman for Matlab-analysis services. 
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Abstract. Bounded model checking has received recent attention as an 
efficient verification method. The basic idea behind this new method is 
to reduce the model checking problem to the propositional satisfiability 
decision problem or SAT. However, this method has rarely been applied 
to Petri nets, because the ordinary encoding would yield a large formula 
due to the concurrent and asynchronous nature of Petri nets. In this 
paper, we propose a new SAT-based verification method for safe Petri 
nets. This method can reduce verification time by representing the 
behavior by very succinct formulas. Through an experiment using a 
suite of Petri nets, we show the effectiveness of the proposed method. 

Keywords: Bounded model checking, SAT, Petri nets 



1 Introduction 

Model checking [5] is a powerful technique for verifying systems that are modeled 
as a finite state machine. The main challenge in model checking is to deal with 
the state space explosion problem, because the number of states can be very 
large for realistic designs. Recently bounded model checking has been receiving 
attention as a new solution to this problem [2,6]. The main idea is to look for 
counterexamples (or witnesses) that are shorter than some fixed length k for a 
given property. If a counterexample can be found, then it is possible to conclude 
that the property does not hold in the system. The key behind this approach is 
to reduce the model checking problem to the propositional satisfiability problem. 
The formula to be checked is constructed by unwinding the transition relation of 
the system k times such that satisfying assignments represent counterexamples. 

In the literature, it has been reported that this method can work efficiently, 
especially for the verification of digital circuits. An advantage of this method 
is that it works efficiently even when compact BDD representation cannot be 
obtained. It is also an advantage that the approach can exploit recent advances 
in decision procedures of satisfiability. 

On the other hand, this method does not work well for asynchronous sys- 
tems like Petri nets, because the encoding scheme into propositional formulas 
is not suited for such systems; it would require a large formula to represent the 
transition relation, thus resulting in large execution time and low scalability. 

To address this issue, approaches that use other techniques than SAT decision 
procedures have been proposed in [8,9]. These approaches allow bounded model 
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checking of Petri nets by using answer set programming [9] and by using Boolean 
circuit satisfiability checking [8]. 

In this paper, we tackle the same problem but in a different way; we propose 
a new verification method using ordinary SAT solvers. As in [8,9], we limit our 
discussions to verification of 1-bounded or safe Petri nets in this paper. The new 
method enhances the effectiveness of SAT-based verification in the following two 
ways. 

— Our method uses a much succinct formula, compared to the existing method. 
This shortens the execution time of a SAT procedure. 

— Our method allows the formula to represent counterexamples of length 
greater than k while guaranteeing to detect counterexamples of length k 
or less. This enlarges the state space that can be explored. 

To demonstrate the effectiveness of the approach, we show the results of applying 
it to a suite of Petri nets. 

The remainder of the paper is organized as follows. Section 2 describes the 
basic definition of Petri nets and how to represent them symbolically. Section 
3 explains our proposed method for reachability checking. Section 4 discusses 
liveness checking. In Section 5 a pre-processing procedure is proposed. Experi- 
mental results are presented in Section 6. Section 7 concludes the paper with a 
summary and directions for future work. 



2 Preliminaries 

2.1 Petri Nets 

A Petri net is a 4-tuple {V,T,iF,Mo) where V = {pi,P 2 , ■ ■ ■ ,Pm} is a finite 
set of places, T = {ti,t 2 , - ' ‘ ^tn} {V C\T = 0) is a finite set of transitions, 
.7^ C (P X T) U (T X P) is a set of arcs, and Mg is the initial state (marking). 
The set of input places and the set of output places of t are denoted by and 
t*, respectively. 

We define a relation A- over states as follows: S' A S" iff some t G T is 
enabled at S and S' is the next state resulted in by its firing. Also we define 
a computation as a sequence of states SgSi ■ ■ ■ Si such that for any 0 < i < I 
either (i) Si A S^+i for some t, or (ii) Si = S^+i and no t is enabled at Si. Si 
is reachable from Sq in i steps iff there is a computation SgSi Si. We define 
the length of a computation SgSi • • • S^ as i. 

A Petri net is said to be 1 -hounded or safe if the number of tokens in each 
place does not exceed one for any state reachable from the initial state. Note 
that no source transition (a transition without any input place) exists if a Petri 
net is safe. For example, the Petri net shown in Figure 1 is safe. Figure 2 shows 
the reachability graph of this Petri net. In this graph, each state is represented 
by the places marked with a token. 
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Fig. 1. A Petri net. 




Fig. 2. Reachability graph. 



2.2 Symbolic Representation 

This subsection describes how a safe Petri net can be represented symbolically. 
For a safe Petri net, a state S can be viewed as a Boolean vector S' = (si, • • • , Sm) 
of length m such that Si = 1 iff place pi is marked with a token. 

Any set of states can be represented as a Boolean function such that /(S) = 
1 iff S is in the set. We say that /(S) is a characteristic function of the set. 

We denote by Et{S) the characteristic function of the set of states in which 
transition t is enabled; that is: 



Et{S) := /\ s. 

For example, for the Petri net we have: 

Et^{S) := si, Et^{S) := si, Et^{S) := S2, Et^{S) := S3, 

:= S4, Et^ {S) := S5, Et^{S) := Sg A S7 

Any relation over states can be similarly encoded since a relation is simply 
a set of tuples. Let Tt{S,S') be the characteristic function for the relation A-. 
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Tt{S,S') is represented as follows: 

n{S, S') := Et{S) A /\ -s' A /\ s' A /\ (s. O s') 

PtGi* PieV\(’tUf) 

For ti in the Petri net in Figure 1, for example, we have: 

Tti (*S*, S ) := Si A 'Si A S 2 A s^ A (s 4 S 4 ) A (S 5 s^) A (sg ^ ("^7 S'^) 

3 Reachability Checking 

3.1 Applying the Existing Encoding to Reachability Checking 

Let R be the set of states whose reachability to be checked and let R{S) denote 
its characteristic function. Although there are some variations [16], the basic 
formula used for checking reachability is as follows: 

I{So) A T{So, Si) A T{Si,S 2 ) A • • • A T{Sk-i,Sk) A (R{So) V • • • V R{Sk)) 

where I{S) is the characteristic function of the set of the initial states, and 
T{S, S') is the transition relation function such that 

T{S, S') = 1 iff S' is reachable from S' in one step. 

Clearly, I{So) A T{So, Si) A T(^i, ^ 2 ) • • • A T{Sk-i,Sk) = 1 iff ^ 0 , Si,---,Sk 
is a computation from the initial states. Hence the above formula is satisfiable 
iff some state in R is reachable from one of the initial states in at most k steps. 
By checking the satisfiability of the formula, therefore, the verification can be 
carried out. 

For Petri nets, Mq is the only initial state. Thus we have: 

I{S) := f\ St A -Sj 

PiGPo PiGV\Po 

where Pq is the set of places marked with a token in Mq. For example, for the 
Petri net in Figure 1, I{S) will be: 

I{S) := Si A — S 2 A —S 3 A — S 4 A — Sg A — Sg A — Sy 

T{S, S') = 1 iff either (i) S -A S' for some t, or (ii) S = S' and no t is enabled 
at S. Hence we have: 

T{S, S') := Tt, {S, 5') V • • • V TtJS, S') 

V( /\ (s,G^s')A-Ei,(5)A---A-Et„(5)) 

In practice, this formula would be very large in size. Figure 3 shows the 
formula T{S,S') obtained from the Petri net in Figure 1. It should be noted 
that the efficiency of SAT-based verification critically depends on the size of the 
formula to be checked. 
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(si A A $2 A S3 A (s4 ^4) A (s5 ^5) A (sg Sg) A (S7 ^7)) 

v(si A 'S]^ A S4 A S5 A (s2 ^2) A (S3 ■e-)' S3) A (sg ■e-)' Sg) A (S7 ■e-)' S7)) 

V (s2 A 1S2 A Sg A (si ■e-)' S]^) A (S3 ^3) A (s4 ^4) A (sg ^5) A (s7 '^7)) 

V (s3 A 1S3 A S7 A (si ■e-)' S]^) A (s2 ^2) A (s4 ^4) A (sg ^g) A (sg ■e-)' Sg)) 

V (s4 A 1S4 A Sg A (si S]^) A (s2 ^2) A (S3 ■e-)' S3) A (sg S5) A (S7 ^ 7 ')') 

V (sg A 1S5 A S7 A (si S]^) A (s2 ^2) A (S3 ^3) A (s4 ^4) A (sg "^g)) 

v(sg A S7 A 'Sg A 'S7 A S]^ A (s2 ^2) A (S3 ^3) A (s 4 *54) A (sg ■e-)' S5)) 

V((si Si) A (s2 S2) A (s3 S3) A (s4 S4) A (sg Sg) A (sg ■<->■ Sg) A (S7 S7) 

A^iSi A ~>S2 A “iS3 A “iS4 A ~<Sg A ~’(sg A S7)) 



Fig. 3. T(5,^0 



3.2 Proposed Encoding 

We define dt{S^S') as follows: 

dt{S,S') ■.= Tt{S,S')V /\ (s,os') 

PiSiV 

■■={{ /\ Si A f\ -.s' A /\ sO V f\ (si O s')) 

PiG't Pie’t\f PtSi* PiG*tUt* 

A /y (si s') 

PiSP\Ctuf) 

For the Petri net shown in Figure 1 , for example, we have: 

dti (S, S') := ((si A -.s']^ A S2 A S3) V ((si O s'j^) A (s2 O S2) A (S3 O Sg))) 
A(s4 -O' S4) A (sg -O' S3) A (sg -O' Sg) A (S7 -f->- S7) 

It is easy to see that dt{S,S') = 1 iff S A S' or S' = S'. In other words, 
c?t(S, S') differs from Tt(S, S') only in that dt{S, S') evaluates to true also when 
S = S'. We now define a relation ^ over states as follows: S ^ S' iff S A S' 
for some t or S = S'. Clearly dt{S, S') = 1 iff S ^ S'. 

A step (or more) can be represented by a conjunction of dt{S, S'). Note that 
this is in contrast to the ordinary encoding where a disjunction of Tt(S, S') is 
used to represent one step. Specifically, our proposed scheme uses the following 
formula (pk- 

P’k • — Alfc A S(S/j,^7i) 



Mu ■■= I{So) 

Ad(i(So,Si) A dt^{Si,S2) A • • • A dt„(S„_i,S„) 

Adt^ (Sn, Sti-I-i) a dt 2 (Sti-I-I , Syi+2) A * * * A dt^ (S2n— 1 5 *5^2n) 

Adti l)=i=n+l) A * * * A df^ {Sk *n— 1 7 



where 
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If (fk is satisfiable, then there exists a state in R that can be reached in at 
most k * n steps from the initial state Mq, because ipk evaluates to true iff (i) 
So = Mq, (ii) for any 0<i<k*n, Si^ 5'i+i, and (iii) Sk*n G R- 

On the other hand, if tpk is unsatisfiable, then one can conclude that no 
state in R can be reached in k steps or less. This can be explained as follows. 
Suppose that a computation MqMi ■ ■ ■ Mi{0 < I < k) exists that starts from 
the initial state Mq to a state Mi in R. Let be the transition such that 

Mj -4 Mj+i. Then, ipk is satisfied by the following assignment: for 0 < j < I, 
Sj:i~n'! *5'j*n+i7 * * * 5 S j j — I — Sj:ifji+ij J * * * 5 and for I S: 

j < k, Sj * * * 5 — Ml. 

An important observation is that the method may be able to find witness 
computations of length greater than k. The length of witnesses that can be found 
is at most k * n, and this upper bound is achievable. 

Another important advantage of our method is that it is possible to con- 
struct a very succinct formula that has the same satisfiability of (pk- Let 
St = (siy, S 2 .i, • • • , For s^ch dt{Sj,Sj+i) in tpk, term O Sjj+i) can 
be removed for all pi G ’P\(*t U t*) by quantifying Sij+i. The reason for this 
is as follows: Let pk be the subformula of (pk that is obtained by removing the 
term {sij O that is, ipk = 4k 4 {sij O s^j+i). Because this term 

{sij O Sjj+i) occurs as a conjunct in ipk, 4k evaluates to true only if Sij and 
Sij+i have the same value. Hence ipk with Sij+i being replaced with Sij has the 
same satisfiability as pk- The other terms remaining in tpk can also be removed 
in the same way. 

Below is the formula that is thus obtained from the Petri net in Figure 1 
when k = 1. 



HSo) 

A^(si^o A A S2,i A S34) 

v((si^0 GA S14) A (s2,0 GA S2 ,i) A (S3^0 GA 53^1 
A -'Si _2 A S 4^2 A 55,2) 

v((si,l O 51,2) A (S 4,0 GA 54,2) A (ss^o GA S5,2 
a((s2,i a -' 52,3 A 55,3) V ((S2,l G->- 82,3) A (s6,0 GA Se,3 

A f (53,1 A -'53,4 A 57,4) V ((53,1 O 53,4) A (57,0 GA 57,4)) 

^f(s 4,2 4 . -'54,5 4 . 55,5) V ((54,2 G->- 54,5) A (55,3 G->- 55,5)) 

A f (55,2 A -'55,6 A 57,0) V ((55,2 G->- 55,0) A (57,4 O 57,0)) 

A f (S 0,5 A 57,6 A -' 50,7 A -' 57,7 A 51,7) 

V((5i,2 G->- 51,7) A (50,5 O 55,7) A (57,6 O 57,7 

4R{S'^ )[s2,7-<— .52,3,S3,7-<— S3,4,S4,7.<— S4,5,S5,7.<— S5,e] 



where represents a formula generated by substituting y for x in F. 
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Note that this resulting formula is quite smaller than T{S, S') in Figure 3, 
since Tt{S,S') contains at least m(= \V\) literals, while its counterpart in our 
encoding only contains at most 4|*t|+3|t*| literals. The difference becomes larger 
if the number of places that are neither input nor output places increases for 
each transition. 

It should also be noted that in this particular case, fc = 1 is enough to explore 
all the reachable states. That is, for any reachable state M, a sequence of states 
S'oS'i ■ ■ ■ S 7 exists such that: So = Mq; for any i, Si-i Si or Si-i = Sf, and 
S 7 = M. 

In the worst case, our encoding introduces n + I different Boolean variables 
for a single place in order to encode one step. Compared with the ordinary 
encoding which requires only two different variables (for the current and next 
states), this number might seem prohibitively large. In practice, however, this 
rarely causes a problem, because for many transitions in Petri nets representing 
practical concurrent systems, input and output places are often only a small 
fraction of all places, and from our experience, the performance of SAT solvers 
critically depends on the number of literals, but not on the number of different 
variables. 

3.3 Complexity Issues 

Reachability checking is a very difficult problem in terms of computational com- 
plexity. For arbitrary Petri nets, the problem is decidable but is in EXPSPACE. 
Although this problem is easier if the Peri net is safe, the complexity is still 
PSPACE-complete [3]. 

The proposed method reduces the reachability problem to the propositional 
satisfiability problem. It is well known that the latter problem is in NP-complete. 
In many practical situations, however, our method works well for the following 
reasons: first, recently developed SAT solvers can often perform very effectively 
due to powerful heuristics; and second, if the state to be checked is reachable 
and is close to the initial state, it will suffice to check a small formula to decide 
the reachability. 

3.4 Related Properties 

Here we show that various properties can be verified with our method, by ac- 
cordingly adjusting R{S) which represents the state set whose reachability is to 
be checked. 

LO- and Ll-liveness. A transition t is said to be Ll-live iff t can be fired at 
least once. If t is not Ll-live, then t is said to be LO-live or dead. Ll-liveness 
can be verified by checking the reachability to a state where the transition of 
interest is enabled. The characteristic function of the states where a transition t 
is enabled is: 

A 

Pi^*t 
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Deadlock. A Petri net has a deadlock if there exists a reachable state where 
no transition can be fired. The set of states where no transition is enabled can 
be represented as: 

- V 

teT 



4 L3-Liveness Verification 



In this section, we discuss verifying properties that can not be checked by reacha- 
bility analysis. As an example, we consider the problem of checking L3-liveness; 
a transition t is said to be Li-live iff a computation starting from the initial 
state exists on which t is fired infinitely often. 

In our method, the L3-liveness of a given transition is checked by searching 
a computation that contains a loop where t is fired. This idea is the same as 
LTL bounded model checking; but in our case a slight modification is required 
to the method described in the previous section, in order to select only loops 
that involve the firing of the transition of interest. 

Specifically we introduce d'f{S,S',f) as an alternative representation to 
dt{S, S'). Here / is a Boolean variable. d'^{S, S', /) is defined as follows. 

4{S, S', /):=(/ o ( /\ Si A /\ -s' A /\ s')) 

P«G*i PtGt* 

^ /\ (Sj Si)) ^ f\ {Si s') 

PiG'tUf Pi€V\CtUf) 



It is easy to see that dt{S, S') = 3f.[d'^{S, S', f)]. Also the following two proper- 
ties hold: 

— d'^{S, S', 0) = 1 iff S' = S" holds and S -A S' does not hold. 

- d'(S,S',l) = liffS4s'. 

Let be defined as follows: 



Ml- := /(So) 

(So, Si) A dt 2 (Si, S 2 ) A • • • A (Sm- 2 , S^-i) 

Ad(^ {Sm- 1 5 fm—l) ^ ^tm+i A * * * A dt^ (^3^—1 7 3^) 

Adti {3ri7 *5*71+ 1 ) A dt 2 (‘5'n+l 5 3n-\-2) A ‘ ‘ ‘ A dt^_.^ (^ 3 n-\-m —27 ‘5'n+m— 1 ) 

Adf^ (^3n-\-m—l 7 3 n-\-m 7 fn-\-m—l) A (‘5'n+m 7 *^n+m+l ) A * * * 

Adtn (*S^2n— 1 j 32n) 



Ad^.^ l)*n? — l)*n+l) A * * * 

— l)*n+m— 15 ^ {k—l)^n-\-mi /(fc— l)*n+m— l) A * * * 

Adt^ 3 k *n— 1 7 Sk *n) 

M*i^ is different from Mk only in that dt„(S, S') is replaced with d'^^{S, S', /). 
Since dt{S,S') = 3/.[d((S, S', /)], if is satisfied by So, Si, • • • , Sfc*„, then 
Si ^ Si+i for any 0 < i < k * n. 
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Now let us define as follows: 

— ‘5'fcjKn) 

A(/zjKn+m— 1 V f {i+l)^n+m—l V * * * V f 

If both and are satisfied by S'o, • • • , then • • • S'fc»„ 

comprises a loop in which Sj ^ Sj+i for some j such that i * n < j < k * n. 
On the other hand, if a computation Mq ■ ■ ■ Mk exists such that Mi is identical 
to Mk and Mj ^ Mj+i{i < j < k), then A can be satisfied by 

assigning as follows: (i) for 0 < j < k, • • • , = Mj and 

= Mj+i, and (ii) for j {i < j < k), if Mj ^ Mj+i, then 

fj+n+m-l = 1; otherwise fj+n+m-l = 0. 

As a result, we obtain: 

:= Mi- A V iCl- 

0<i<k-l 

Because of the above discussions, one can see that if is satisfiable, then 
transition tm is L3-live; otherwise, no computation of length less than or equal 
to k exists that is a witness for L3-liveness of tm ■ 

5 Transition Ordering 

In this section, we introduce an important pre-processing procedure for our 
method. Thus far we have not discussed the order of the transitions; we have 
implicitly assumed that transitions with small indices are taken first into con- 
sideration in constructing a formula to be checked. However the state space that 
can be explored by our method critically depends on this order of transitions. 
As an example, take the Petri net shown in Figure 1 again. As stated before, all 
reachable state can be explored by our method even with fc = 1 if transitions are 
considered in the order ti,t 2 , ■ ■ ■ Now suppose that the order were the oppo- 
site, that is, tj, te, - ■ ■ ,ti. In this case the number of states that can be checked 
would be considerably reduced. Precisely, when k = 1, the state sets that can be 
reached would be {Mq, Mi, M 2 }. 

To obtain an appropriate order, we develop a heuristic as shown in Figure 
4. In this algorithm a set Visited and a queue Done are used to represent the 
set of already visited places and the order of the transitions, respectively. This 
algorithm traverses the net structure in a depth first manner, from a place with 
a token in the initial state Mq. The procedure visit-placeO is called with the 
parameter p being that place. In the procedure, p is added to Visited first. Then 
for each of the output transitions, say t, the following is done: If t has not yet 
been ordered and all its input places have been visited, then t is enqueued and 
the procedure is recursively called with the parameter being each of t’s output 
places. Because a transition t is ordered after all its input places are visited, t is 
usually ordered earlier than the transitions in *(*t). 
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Transitions that can not be reached from the initially marked places are not 
ordered in this algorithm. Since a safe Petri net has no source transition, those 
transitions will never be enabled and thus can be safely ignored. 

Finally we should remark that a problem similar to this transition ordering 
arises in using transition chaining [14], a technique for speeding up BDD-based 
reachability analysis. The authors of [14] suggested that given a transition t, all 
transitions in *(*t) should be ordered first. However no detailed algorithm was 
presented in their paper. 



main { 

set Visited := 0; 
queue Done := 0; 

for p £ Pq // Pq is the set of marked places in Mq. 
call visit_place{p); 

} 

visit jplace{p){ 
add p to Visited; 
for t £ p* 

if(t 0 Done and Vp £ *t [p £ Visited]){ 

enqueue t to Done; 

for p : p £ t‘ and p ^ Visited 

call visit_place{p); 

} 

} 



Fig. 4. Algorithm for ordering transitions. 



6 Experimental Results 

We conducted experimental evaluation using a Linux workstation with a 2GHz 
Xeon processor and 512MByte memory. All Petri nets used in the experiment 
were taken from [8]. Table 1 shows the size of these Petri nets. Remember that 
m = \V\ is the number of places and n = \T\ is the number of transitions. The 
‘States’ column represents the number of reachable states reported in [8]. These 
Petri nets all contain deadlock. 

We checked the following properties: (1) Ll-liveness for a transition; (2) dead- 
lock freedom; and (3) L3-liveness for a transition. For (1) and (3), the transition 
to be checked was selected at random. In the experiment, ZChaff, an implementa- 
tion of Chaff [13], was used as a SAT solver. Structure preserving transformation 
[15] was used for transforming the formula to be checked into CNF, as ZChaff 
only takes a CNF Boolean formula as an input. 

For each model we incremented k until a satisfying assignment (that is a 
counterexample or a witness) was found. Table 2 shows the time (in seconds) 
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Table 1. Problem Instances. 



Problem 


m 


n 


States 1 


DARTES(l) 


331 


251 >15000001 


DP(12) 


72 


48 


531440 


ELEV(l) 


63 


99 


163 


ELEV(2) 


146 


299 


1092 


ELEV(3) 


327 


783 


7276 


ELEV(4) 


736 1939 


48217 


HART(25) 


94 


92 >1000000 


HART(50) 


252 


152 >1000000 


HART(75) 


377 


227 >1000000 


HART(IOO) 


502 


302 >1000000 


KEY(2) 


94 


92 


536 


KEY(3) 


129 


133 


4923 


KEY(4) 


164 


174 


44819 


MMGT(2) 


86 


114 


816 


MMGT(3) 


122 


172 


7702 


MMGT(4) 


158 


232 


66308 


Q(l) 


163 


194 


123596 



required by ZChafF to find a satisfying assignment for the value of k. For some 
cases, a counterexample/witness could not be found because no such computa- 
tion existed, processing was not completed within a reasonable amount of time, 
or memory shortage occurred. For these cases we show the largest k for which 
our method was able to prove that no satisfying assignment exists (denoted as 
‘> fc’) and the time used to prove it (denoted with parentheses). 

For comparison purposes, the following two other methods were tested: ordi- 
nary SAT-based bounded model checking and BDD-based model checking. Both 
methods are implemented in the NuSMV tool [4]. We used the PEP tool [7] 
to generate the input programs to NuSMV from the Petri nets. Table 2 shows 
the results of using these methods; the “SAT” columns and the “BDD” columns 
show the results of using ordinary SAT-based bounded model checking and those 
of BDD-based model checking, respectively. L3-liveness was not checked because 
this property is not always verifiable with these methods. 

From the results, it can be seen that the proposed method completed veri- 
fication for most of the problems that the two existing methods were not able 
to solve. When compared to the ordinary bounded model checking, one can see 
that our method required a much smaller value of k to find a witness. As a result, 
even for the problems the existing methods were able to handle (e.g., DP(12) or 
KEY(2)), our method often outperformed these methods in execution time. 
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Table 2. Results. 





I/l-Liveness 


Deadlock 


I/3-Liveness 


Ours 


NuSMV 


Ours 


NuSMV 


Ours 


SAT 


BDD 


SAT 


BDD 


Problem 


k 


Time 


k 


Time 


Time 


k 


Time 


k 


Time 


Time 


k 


Time 


DARTES(l) 


1 


0.01 


>1 


0.01 


NA 


1 


0.02 


>1 


0.02 


NA 


>10 


0.18 


DP(12) 


1 


0.0 


3 


0.01 


NA 


1 


0.0 


12 


37308.6 


NA 


1 


0.0 


ELEV(l) 


2 


0.01 


8 


1.14 


0.46 


1 


0.04 


9 


0.26 


0.55 


>27 


0.19 


ELEV(2) 


2 


0.03 


>5 


1.75 


8.72 


1 


1.12 


>5 


6.82 


32.46 


>8 


0.14 


ELEV(3) 


2 


0.11 


>1 


0.0 


NA 


1 


14.64 


>1 


0.02 


NA 


>3 


0.05 


ELEV(4) 


>1 


0.07 


>0 


0.0 


NA 


1 


208.4 


>0 


0.01 


NA 


>1 


0.0 


HART(25) 


1 


0.0 


>6 


0.47 


NA 


1 


0.0 


>6 


5.67 


NA 


>38 


1022.46 


HART(50) 


1 


0.0 


>2 


0.02 


NA 


1 


0.0 


>2 


0.02 


NA 


>19 


4.47 


HART(75) 


1 


0.0 


>1 


0.0 


NA 


1 


0.0 


>1 


0.01 


NA 


>12 


158.68 


HART(IOO) 


1 


0.01 


>0 


0.0 


NA 


1 


0.0 


>0 


0.0 


NA 


>9 


34.11 


KEY(2) 


1 


0.0 


6 


0.03 


746.07 


1 


0.0 


>9 


1.61 


NA 


>29 


0.29 


KEY (3) 


1 


0.01 


6 


0.04 


NA 


1 


0.0 


>6 


1.43 


NA 


>20 


0.4 


KEY(4) 


1 


0.01 


>4 


0.14 


NA 


1 


0.0 


>4 


0.18 


NA 


>15 


0.28 


MMGT(2) 


1 


0.0 


5 


0.03 


0.65 


1 


0.0 


8 


0.07 


0.71 


>22 


0.26 


MMGT(3) 


2 


0.01 


>6 


2.81 


1.49 


2 


0.07 


>6 


46.78 


1.6 


>14 


0.18 


MMGT(4) 


3 


0.02 


>5 


0.24 


3.53 


3 


0.25 


>5 


12.64 


3.61 


>10 


0.18 


Q(l) 


>13 


5.28 


>4 


0.19 


NA 


1 


0.02 


>4 


0.49 


NA 


>13 


3.3 



7 Conclusions 

In this paper, we proposed a new method for bounded model checking. By ex- 
ploiting the interleaving nature of Petri nets, our method generates much more 
succinct formulas than ordinary bounded model checking, thus resulting in high 
efficiency. We applied the proposed method to a collection of safe Petri nets. The 
results showed that our method outperformed ordinary bounded model checking 
in all cases tested and beat BDD-based model checking especially when a short 
computation exists that was a counterexample/ witness. 

There are several directions in which further work is needed. First a more 
comprehensive comparison is needed with existing verification methods other 
than those discussed here. Especially, comparison with bounded reachability 
checking proposed by Heljanko [8] should be conducted, since his approach is 
similar to ours in the sense that both can be used for reachability checking for 
safe Petri nets. Other important verification methods include those based on 
partial order reduction. 

We think that applying the proposed method to other models than (pure) 
Petri nets is also important. To date we have obtained some results of applying 
our method to detection of feature interactions in telecommunication services 
[17]. Hardware verification (e.g., [10,18]) is also an important area where the 
applicability of our method should be tested. 
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Another direction is to extend the method to verify properties other than 
reachability and L3-liveness. Currently we are working on modifying the pro- 
posed method for verification of arbitrary LTL_x formulas. LTL_x is an impor- 
tant class of temporal logic; many model checking tools, such as SPIN [11] and 
the one described in [9], can be used to verify temporal formulas in this class. 

Recently it was shown that SAT can be used, in combination with unfolding 
[12], for coverability checking of unbounded Petri nets [1]. Our approach can not 
be directly applied to unbounded Petri nets; but we think that extending the 
proposed method to infinite state systems is a challenging but important issue. 



Acknowledgments. The authors wish to thank anonymous referees for their 
useful comments. 
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Abstract. We apply linear algebra techniqnes to over-approximate the 
reachability relation of a numerical system (Petri nets, counters au- 
tomata, timed automata and so on) by a transitive and reflexive finite 
union of affine spaces. Thanks to this kind of approximation, we nat- 
urally define the notion of disjunctive place invariants. All the results 
presented in this paper have been implemented as a plug-in for our sym- 
bolic model-checker Fast and applied to the 40 systems available on the 
FAST-homepage. 



1 Introduction 

The reachability problem often reduces to the effective computation of the reach- 
ability relation of a system. In general this computation is not possible and we 
cannot even decide if a given state is reachable from an initial one. These unde- 
cidability results explain why we are interested in developing some approximated 
computation of the reachability relation. 

In this paper, we are interested in using linear algebra techniques in order to 
get an over-approximation of the reachability relation of a system that uses m > 
1 rational variables x\, ..., Xm, called a numerical system; This is for instance the 
case for timed automata, Petri nets and their extensions, and counters automata. 
Under some algebraic conditions often met, we have proved in [Ler04] that the 
set of affine relations, also called place invariants cq + = 0 

valid for any couple {{x\, . . . , Xm), Wi, . . . , x'^^)) in the reachability relation of a 
numerical system, can be computed in polynomial time. Moreover, we have also 
shown that the over-approximation of the reachability relation obtained is equal 
to the least reflexive and transitive affine relation that contains it. 

Unfortunately the computation of these invariants are often useless when 
the analyzed system have some control locations. In fact, in this case, we are 
naturally interested in computing some affine relations between the variables 
that depend on the control locations. That means, we want to compute a 
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reflexive and transitive flnite union of affine relations (one affine relation for 
each pair of control locations) that over-approximates the reachability relation. 

Our results. 

— We introduce the class of finite union of affine spaces, called semi-affine 
spaces. By proving that any infinite intersection of semi-affine spaces remains 
semi-affine, we show that any relation over Q™ can be over-approximated 
by a least reflexive and transitive semi-affine relation called the semi-affine 
star hull. This property proves that independently of the knowledge of the 
control locations of the considered system, we can define the most precise 
semi-affine over-approximation. 

— We prove that the semi-affine star hull of any affine relation is effectively 
computable. This result is used in a semi-algorithm for computing the semi- 
affine star hull of any semi-affine relation when it terminates. Even if the 
termination of this semi-algorithm is left open, we prove that it stops on a 
large class of semi-affine relations that enables to compute the semi-affine 
star hull of the reachability relation of any upward-closed finite linear system, 
a class that contains any reset/transfer Petri Net [DFS98,Cia94] and any 
Broadcast protocol [EN98,Del00]. 

— We have implemented this semi-algorithm as a plug-in for our symbolic 
model-checker Fast and we have experimented it over all the 40 examples 
of counters systems freely available on the FAST-homepage [Fas]. Thanks 
to the approach developed in this paper, our algorithm do not require the 
user to describe the control location of the system because it discovers them 
automatically during the computation. 

Related works. 

In [Kar76], Karr presented a polynomial time algorithm that computes the set 
of affine relations that hold in a control location of some numerical systems. 
Recently, the complexity of this algorithm was revisited in [MOS04b] and a fine 
upper-bound was presented. In [CH78], the use of convex polyhedrons rather 
than affine spaces allows to discover inequalities between variables xi < 2 x 2 . 
By replacing the affine spaces by roots of polynomes with a bounded degree, in 
[MOS04a], they show that equations of the form xiX 2 = x^ can be automatically 
discovered. 

The computation of place invariants for many Petri nets extension was 
studied in [Cia94]. In [Ler04], this computation was proved to be possible for 
any numerical system as soon as the affine hull of the reachability relation in 
one steep is computable. For applications of place invariants to the verification 
of infinite state systems, see for instance [DRVOl]. 

Plan of the paper. 

Some definitions and techniques from the field of linear algebra are recalled in 
section 2. In the next one, we introduce the class of semi-affine spaces and the 
definition of semi-affine star hull of a relation. After having proved that the 
semi-affine star hull of an affine relation is effectively computable in section 4, 
we show in section 5 how to use these result to implement a semi-algorithm that 
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computes the semi-affine star hull of any semi-affine relation. Finally, in the last 
section 6, practical results provided by an implementation of the semi-affine star 
hull as a plug-in of our symbolic model-checker Fast [BFLP03] are presented. 

Some proofs had to be omitted due to space constraints. A self-contained 
long version of this paper (with detailed proofs for all results) can be obtained 
from the author. 



2 Elements of Linear Algebra 

In this section, we recall some classical results of linear algebra. 

The cardinal of a finite set X is written card(A). The set of rational numbers 
and non negative integers are respectively written Q and N. The set of vectors 
with m > 1 components in a set X is written A™. The zero vector (0, . . . ,0) is 
just written 0. Given an integer i G {1, . . . ,m} and a vector x G Q™, the i-th. 
component of x is written x[i] G X. The vectors x + y and t.x are defined by 
{x + y)[i] = (x[i])-|-(j/[f]) and (t.x)[z] = t.{x[i]) for any z G {1, • . ■ ,m}, x,y G Q™, 
and t € Q. For any i G {I,-- - ,rn}, we define the unity vector G Q™ by 
ei[j] = 0 if j yf z and ei[i] = 1. The set of matrices with m rows and n columns 
in Q is written When m = n, we denote by Mm(Q) = 

the set of square matrices. The product of two matrices M, M' is written M.M' 
and the product of a matrix M by a vector v is written M.v. We naturally 
define A + B = {a + h] (a, &) G A x B} and T.A = {t.a; (t,a) £ T x A} for 
any A,BC Q'" and T C Q. When A = {a}, B = {6} or T = {t}, we have 
a -I- i? = {a} + B, A + b = A + {6}, t.A = {t}.A and T.a = T.{a}. 

An affine space A of Q™ is a subset of Q'" such that for any finite sequence 
such that have '^^^jti.A C A. The affine hull aff(A) of 

a set X C Q™ is the least affine space for the inclusion that contains X (recall 
that any infinite intersection of affine spaces remains an affine space). An affine 
basis of an affine space A is a minimal (for the inclusion) finite subset B C A 
such that A = a&{B). Recall that the integer card(R) — 1 is independent of B 
and its called the dimension of A. A vector space is an affine space such that 
0 G A. The direction of a non empty affine space A is the vector space defined 
by = A — A. 

We recall the following lemma. 

Lemma 2.1. For any finite sequence of affine spaces (Aj)jg/ with / yf 0, and 
for any affine space A C A^, there exists i £ I such that A C Aj. 

For any function f : D ^ Q*” such that D C Q™ and for any z > 0, we 
denote by /* the function defined by the induction = x for any x £ Q™ 

and = /* o /• For any function f : D ^ Q" and f : D' ^ Q” such that 
D,D' C Q™, functions f + f and t.f are respectively defined over DDD' and D 
by {f+f'){x) = f{x)+f'{x) and ft.f){x) = t.f{x). For any function f : D ^ Q*” 
and for any X C Q” and Y C Q™, sets f{X) and f~^{Y) are respectively defined 
by /(A) = {/(x); x G A fl D} and f~^{Y) = {x £ D] f{x) £ Y}. A function 
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f : D ^ Q™ with D C Q" is said affine if there exists a matrix M £ Mm,n(Q) 
and a vector v £ Q™ such that f(x) = M.x + v for any x £ D. When w = 0, 
such a function is said linear. 

The following technical lemma will be useful: 

Lemma 2.2. For any linear function I : Q™ — >■ Q™, we can effectively compute 
some vector spaces Vq, Vi and Voo and an integer p > 1 such that: 

- = Vo + Vi + 

- l"^{Vo) = {0}, l{Vi) = Vi, /(Foo) = 

- {IP - l°)"^{Vi) = {0}, 

- For any n> 1, {I'^-p — ^°)(Vi) = {F — ?°)(Vi), and 

- For any n>l, {F - l°){Vc^) = V^. 

A (binary) relation (over Q"^) is a subset of Q™ x Q™. The identity relation 
I is defined by / = {(a;, x); x £ Q'"}. The composition of two relations TZ,TZ' is 
written TZ.TZ' and defined by {x, y') G TZ.TZ' is and only if there exists x' G Q™ 
such that (x,x') £ TZ and {x',y') G TV . For any relation TZ and for any z > 0, we 
denote by 7^* the relation defined by the induction VP = I and 7^*+^ = TZ^ .TZ. 
A relation TZ is said reflexive if I CTZ and it is said transitive if TZ.TZ C TZ. The 
reflexive and transitive closure TZ* of a relation TZ is defined by TZ* = Uto^*- 

3 Semi-afRne Star Hull 

Recall that we are interested in over-approximating the reachability relation 
of a numerical system by a reflexive and transitive finite union of affine 
relations. In this section, we show that there exists a unique minimal such 
over-approximation called the semi-affine star hull. 

The following proposition proves that the class of finite union of affine spaces, 
called semi-affine spaces, is stable by any infinite intersection. In particular, 
the least semi-affine space saff(A) for the inclusion, that contains a given set 
X C Q™, called the semi-affine hull, can be easily defined as the intersection of 
all the semi-affine spaces that contain X. 

Proposition 3.1. Any finite or infinite intersection of semi-affine spaces is a 
semi-affine space. 

Proof. Let us first prove by induction over A: > — 1 that any non-increasing 
sequence of semi-affine spaces (S„)n>o such that the dimension of aff(S'o) is 
bounded k, is ultimately constant. If A: = —1, we have aff(S'o) = 0 and in this 
case, {Sn)n>o is constant. Now, assume the induction true for an integer A: > — 1 
and let us consider a non increasing sequence of semi-affine spaces (S'„)„n G N 
such that the dimension of aff(S'„) is equal to A: -I- 1. Remark that if Sn is an 
affine space for any n > 0, then (5'„)„>o is a non increasing sequence of affine 
spaces. In particular, this sequence is untimely constant. So, we can assume that 
there exists an integer no > 0 such that Sno is not an affine space. There exists 
a finite class C of affine spaces such that Sn^ = IJ^ggA. Let A £ C. From 
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A C Sng C 5'o C aff(5'o), we deduce that the dimension of A is less than or 
equal to fc + 1. Moreover, if it is equal to fc + 1, from A C aff(S'o), we deduce 
A = aff(S'o) and we get Sng = A is an affine space which is a contradiction. As 
the sequence (5'„n A)„>q is a non increasing sequence of semi-affine spaces such 
that the dimension of aff(S'„ fl A) C A is less than or equal to k, the induction 
hypothesis proves that there exists ua > 0 such that An fl A = fl A for 
any n > ua- Let us consider N = Tna,y^A^e{'no,nA)- For any n > Na, we have 
Sn C Sng = UasC ^ A„ n A = Sn n A. Therefore A„ = Sn for any n > N 
and we have proved the induction. 

Now, let us consider a non empty class C of semi-affine spaces and assume 
by contradiction that Hsee ^ ^ semi-affine space. We are going to build 

by induction an increasing sequence (C„)„>o of finite subsets of C such that 
risee„+i ^ strictly included in risee„ Co = 0 and assume that Co, ..., 

C„ are build and let us build C„+i. Remark that ^ semi-affine 

space. Hence, it cannot be equal to the semi-affine space Plsee particular, 

there exists S' G C such that S' 2 Hseen C„+i = C„ U {A'} and remark 

that risee„+i ^ strictly included in flseen build by induction a 

strictly decreasing sequence of semi-affine spaces. This is in contradiction with 
the previous paragraph. Therefore HseC ^ semi-affine. □ 



Example 3.1. The semi-affine hull of a finite subset X C Q'" is equal to X 
because X is the finite union over a: G A of the affine spaces {x}. 

The following lemma will be useful to compute the semi-affine hull of some 
sets. 

Lemma 3.1. 

— For any infinite subset A C Q, we have saff(A) = Q. 

~ For any affne function f : Q™ -G Q™ and for any subset X C Q"*, we have 
saff(/(A)) = /(saff(A)). 

— For any A C A' C Q™, we have saff(A) C saff(A'). 

— For any subsets A, A' C Q™, we have saff(A U A') = saff(A) Usaff(A'). 

— For any subsets A, A' C Q™, we have saff(A -|- A') = saff(A) -|- saff(A'). 



Example 3.2. The semi-affine hull of N™ is equal to Q™. In fact, from lemma 3.1, 
we deduce saff(N’”) = Yh=i saff(N.ei) = Yh=i saff(N).ei = Yh=i 'Q-®* = Q""- 

Proposition 3.1 also proves that the least reflexive and transitive semi-affine 
relation saff*(7^) for the inclusion that contains a given relation TZ C Q"* x Q™, 
called the semi-affne star hull, can also by defined as the intersection of all the 
reflexive and transitive semi-affine relations that contain TZ. 

Example 3.3. Let us consider the relation TZ = {{x,x') G Q x Q; x' > x} 
corresponding to a transition of a system that uses one clock x. In this case 
saff*(lR.) = Q X Q. 
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Example 3 . 4 - Let us consider the relation TZ = {{x,x')]£ x N^; xi > 
1 {x'i,X2) = {0,X2 + a^i)} that corresponds to the effect of a transition in a 
counters system that “transfer the value of a counter xi into a counter X2”. We 
have saff*( 7 ?.) = lU {(x, x'); € x Q^; (x[,X2) = ( 0 , X2 + Xi)} which is not an 
affine space. 

4 Computation of the Semi-afRne Star Hull of an AfRne 
Relation 

This section contains the proof that the semi-affine star hull of an affine relation 
7 Z is effectively computable. In order to get this result, we show that this 
computation can be reduced to the computation of the semi-affine hull of 
{(x, /*(x)); i > 0 } where / is an affine function associated to TZ. 

The definition domain D(TZ) of an affine relation TZ is the affine space nat- 
urally defined by D(TZ) = {x G Q™; 3 y G Q™ (x,y) G TZ}. For any x G D(TZ), 
the affine space TZx = {y G Q™; (x,y) G TZ} is a not empty. Even if TZx obvi- 
ously depends on x, the following lemma proves that this is no longer true for 
its direction TZx. 

Lemma 4.1. For any non empty affine relation TZ and for any xi,X2 G D(JZ) 
we have TZx^ = TZx2 ■ 

Proof. Let v G TZx[ and let us prove that v G TZf^. There exists (xi,y), (xi, y') G 
TZ such that v = y' — y. As X2 G D{TZ), there exists (x2,j/2) & TZ. As TZ is 
an affine space, we have (x2,j/2 + x) = (x2,y2) + {xi,y') — (xi,y) G TZ. Hence 
J/2 7 y2 + X G TZx2 and we have proved that x G TZx^ ■ □ 

Therefore, for any non-empty affine relation TZ, there exists a unique vector 
space V{TZ), called the direction of TZ such that for any x G D{TZ), we have 
Tzt = V{TZ). We make the convention V{%) = Q™. 

Example 4 -T. Let f : D ^ Q™ be an affine function defined over a non empty 
affine space D and let TZ = {(x, /(x)); x G D} be the graph of /. In this case, 
D{TZ) = D and V{TZ) = { 0 }. In fact, we can prove that a non empty affine 
relation TZ is the graph of an affine function if and only if V{TZ) = { 0 }. 

Remark that (L>( 7 ^*))i>o is a decreasing sequence of affine spaces whereas 
(V {TZ^))i>o is an increasing sequence of vector spaces. Let us denote by D{TZ°°) = 
rii>oL>( 7 ^*) and V{TZ°°) = Ui>o^(^*)- following lemma will be useful to 
compute effectively these sets. 

Lemma 4.2. For any affine relation TZ over Q™ such that TZ™^^ yf 0 , we have 
D{TZ°°) = D{TZ^) and V{TZ°°) = V{TZ”^). 

Sets D{TZ°°) and V{TZ°°) enable us to associate to TZ an affine function / 
such that TZ^ is easily related to /* for any z > 0 . 
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Proposition 4.1. For any affine relation TZ such that 7^"*+^ ^ 0, we can ef- 
fectively compute an affine function f : D(TZ°°) — >■ D(TZ°°) such that for any 
i > m, we have: 

TV = {(x, f (x)); X G D{TV^)} + ({0} x V{TV^)) 

Proof. We first prove that we can compute an affine function / : Q*” — Q™ 
such that C D{TV) for any i > 0 and such that (x, /(x)) G TZ for 

any x G D{TZ). As the sequence (ZI(7?.*))o<i<Tn is decreasing, we can compute 
a decreasing sequence Bq, Bm of finite subsets of Q™ such that Bi is an 
affine basis of D{TZ'‘). In particular Bq is an affine basis of Q™ = D{TZ^). Let 
us build a function F : Bq ^ Q™. For any b G Bq\Bi, let F{b) = b. For any 
z G {1, . . . , m — 1} and for any b G Bi\Bi+i, there exists (b,y) G TZ^. So, we 
can find F{b) G Q™ such that (b,F{b)) G TZ and {F{b),y) G TZ^~^. That means 
F{b) G D{TZ^~^). Finally, let b G B^. As 6 G D{TZ^) = D{TZ^~^^), there exists 
y G Q™ such that (6, y) G Hence, there exists F{b) G Q™ such that 

{b,F{b)) G TZ and {f{b),y) G TZ"^. So F{b) G D{TZ°°). As Bq is an affine basis 
of Q™, there exists a unique affine function / effectively computable such that 
f{b) = F{b) for any b G Bq. As for any b G 77(7^*+^), we have f{b) G D{TZ^), 
we deduce /(77(7^*+^)) C D{TZ^). Moreover, as (b,f{b)) G TZ for any b G Bi, we 
deduce (x, /(x)) G TZ for any x G D{TZ). 

Now, let us prove by induction over i the following equality TV^ = {(x, x'); x G 
D{TZffi x' G /*(x) + V (TZ^)}. Remark that for z = 0 the equality is true. Assume 
it true for an integer z and let x G 77(7^*+^). We have /(x) G D{TZ^). The 
induction hypothesis proves that (/(x), /*(/(x))) G TZ^. As (x, /(x)) G TZ, we 
have (x, /*+^(x)) G TV‘~^^ . By definition oiV^RT^^), we have proved that 7^*+^ = 
{(x,x'); X G 77(7^*+^); x' G /*+^(x) + R(7^*+^)} which proves the induction. 

As /(77(7^'"+l)) C 77(7^'") and 77(7^""+l) = 77(7^’”) = 77(7^“), we have 
proved the inclusion f{D{TZ°°)) C D{TZ°°). Moreover, from D{TV^) = D{TZ°°) 
and V{TV^) = V(TZ°°) for any i > m, we have proved the proposition. □ 

Let us now fix an affine relation TZ such that D{TZ°°) yf 0. We denote by / : 
D{TZ°°) — >■ 77(oo) an affine function satisfying proposition 4.1. Let I : D{TZ°°] — >■ 
D{TZ°°) be the linear function defined by /(x) — /(x') = /(x — x') for any x, x' G 
D{TZ°°). Let Vq, Vi and Voo and p > 1 satisfying lemma 2.2. We are going to 
prove that saff(7^*) = saff*(7^) = TV where TV is defined by: 

m— 1 p— 1 

TZ' = U 7^* U afr(7^’”+^■ U 7^’”+^+P) 

i—0 j—Q 

To obtain such an equality, we first show that the vector space 
Wd = Q.{ffi'+P{d)-f^{d)) + V^ + {lP-f){Vi) does not depend on d G D{TZ°°). 



Lemma 4.3. For any di,c?2 G D{TZ°°), we have Wd^ = IFda- 

Proof From (/”"+P(di) - /™(di)) = (/”"+^’(d2) - /™(rf2)) + - 

c?2), we deduce the inclusion — /'"(di)) + — l"^){Vi) C 

Q_(y'm+p(^2) — /'"(d2)) + — Z’”)(Vi). In particular, we have Wdi C Wd2- 

By symmetry, we have Wd^ = Wd2- □ 
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Therefore, we can naturally define W as the vector space satisfying W = Wd for 
any d G D{Tl°°). 

Lemma 4.4. For any j >m and n\ yf n- 2 , we have: 

+ ({0} X IT) = U = (IT x {0}) + W 

Thanks to the previous lemma, we prove the following corollary: 

Corollary 4.1. ITe have saff(7^*) = saff*(7^) = TZ' . 

Proof. Let us first prove that saff*(7^) C TZ'. Lemma 4.4 shows that for any 
j,f > m, we have TZ.aS{TZ^ U TZ^+p) = TZ^+^ + ({0} x IT) C and aff(7^^ U 
7^^+P).aff(7^^■' U7^^■'+P) = ((IT x {0}) +TZ^).{TZ^' + ({0} x W)) C (IT x {0}) + 
jZ^+j _|_ ({0} X W) = TZ^~^^ + ({0} X W) C TZ' . Hence TZ' is transitive. Moreover, 
as TZ' is reflexive (/ C TZ') and contains TZ, by minimality of the semi-affine star 
hull, we have saff*(7^) C TZ' . 

Next, we prove TZ' C saff(7^*). As saff(7^*) is a semi-affine space, there exists 
a finite class of affine spaces 6 such that saff(7^*) = U/iGe^- d ^ As 
for any n > 0, we have 7^t+"-P C saff(7^*), lemma 2.1 shows that there exists 
An G C such that TZ^^^ p C A„. However, the class C is finite. So there exists 
ni yf ri 2 such that A = = A„ 2 . From 7 ^t+"i p y jij+n 2 -p q deduce by 

minimality of the affine hull that aS{TZd~^"^ P U TZ^'^'^^ p) C A C saff(7^*). From 
lemma 4.4, we deduce that TZ' C saff*(7^). 

From saff(7^*) C saff*(7^) and the two previously proved inclusions, we are 
done. □ 

In particular, we have proved the following theorem. 

Theorem 4.1. The semi-affine star hull of any affine relation is effectively com- 
putable. 

Remark f.l. The hard part to implement an algorithm that computes the semi- 
affine star hull of an affine relation, is the computation of an integer p > 1 that 
satisfies lemma 2.2. In fact, when p is computed, we can just use the equality 
saff*(7^) = (J™ 7^* (J^Zq aff(7^’”+^ \JTZ"'^d+py [Ler03], a precise analysis 
of the complexity proves that such an integer p can be computed in polynomial 
time and it can be bounded by p < (4m) That shows that the semi-affine star 
hull of an affine relation is computable in exponential time in the worst case. 



5 Computation of the Semi-afRne Star Hull of a 
Semi-afRne Relation 

In the previous section, we have proved that there exists an algorithm that 
computes the semi-affine star hull of any affine relation. We are going to prove 
that this algorithm can be naturally used to implement a semi-algorithm that 
computes the semi-affine star hull of any semi-affine relation when it terminates. 
Even if we are not able to prove the termination of our semi-algorithm in the 
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general case, we prove that it is sufficient for computing the semi-affine star hull 
of the reachability relation of any upward-dosed finite linear systems, a class 
of systems that contains any reset/transfer Petri Nets [DFS98,Cia94] and any 
Broadcast Protocols [EN98,Del00]. 

By definition, for any semi-affine space S, there exists a finite class 6 of affine 
spaces such that S = IJagB Remark that such a finite class C is not unique. 
However, if we only consider the maximal elements of C for the inclusion, we are 
going to prove that we obtain a class that only depends on S. 

Definition 5.1. A component of a semi-affne space S is a maximal affne space 
Acs for the inclusion. The set of components of S is written comp(S'). 

Proposition 5.1. For any semi-affne space S, the set of components comp(5') 
is fnite and satisfes S = UyiGcomp(S) Moreover, for any fnite class C of 
affne spaces such that S = U^iee maximal elements in C for the 

inclusion is equal to comp(S'). 

Proof. Let S' be a semi-affine space. There exists a finite class C of affine spaces 
such that S = Let us consider the class Co of maximal affine spaces 

in C for the inclusion and let us prove that comp(S) = Cq. Let A G comp(S). 
From A C lemma 2.1, we deduce that there exists A' G C such 

that A C Al . By definition of Cq, there exists Aq G Cq such that A' C Aq. 
Moreover, as H G comp(S) and H C Hq C S, by maximality of A, we have 
A = Aq and we have proved that H G Cq. Now, let us consider Aq G Cq. Let us 
consider an affine space A such that Aq C A C S and let us prove that Aq = A. 
From A C we deduce that there exists G C such that A C A'. 

As Ag C A' and A' G C, by maximality of Aq, we deduce Aq = A'. Inclusions 
Aq C A C A' = Aq show that Aq = A and we have proved that Ag C S' is 
maximal. Hence Ag G comp(S). □ 

Remark 5.1. The previous proposition enables to canonically represent a semi- 
affine space by its finite set of components. This is a useful property in order to 
implement a semi-affine library. 

We can now provide our semi-algorithm that computes the semi-affine star 
hull of a semi-affine relation. 



Algorithm 1 Semi-algorithm that computes the semi-affine star hull. 
1: input: A semi-affine relation TZq. 

2: output: The semi-affine star-hull of TZq. 

3: 

4: 7?. UK„gcomp(-R,o) (^“) 

5: while TZ.TZ % TZ do 

6: ^ U7^a.7^i,ecomp(7^) (TZa.TZb) 

7: return TZ 
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Theorem 5.1. When semi- algorithm 1 stops, it returns the semi-affine star hull 
of the input relation. 



Definition 5.2. A numerical system N is a tuple N = {S , (flZa) a^s) where S 
is a finite set of actions and TZa is a relation over Q*” defined by a formula in 
the first order theory (Q™, N™, +, <). 

The termination of semi-algorithm 1 is an open problem. We are going to 
prove that under some natural conditions over N (often meet in practice), the 
effective computation of the semi-affine star hull of the reachability relation TZ*p 
defined as the reflexive and transitive closure of the reachability relation in one 
step TZp = Uaei:^“’ effectively computable. 

An affine function f : D ^ N™ defined over a non empty set D C N'" such 
that D -\- N™ C D, is said upward-closed. Remark that there exists a unique 
couple (Mf,Vf) where Mj G Mm(Q) and Vf G Q™ such that f{x) = Mj.x -\- Vf 
for any x G Dj. 

Definition 5.3 ([FL02]). An upward-closed finite linear system P is a tuple 
P = (A, {fa)a^s) where E is a non empty finite alphabet of actions and fa is an 
upward-closed affine function, such that the monoid Mp generated by the product 
of the matrices Mf^, a G E is finite. 

Naturally, a numerical system N = {E, {TZa)a&s) can be naturally asso- 
ciated to an upward-closed finite linear system S = {E, {fa)a&s) by TZa = 
{{x, fa{x)); X G Da}. The reachability relation in one step TZp is defined by 
= UaGi: TZa and the relation TZ*p is naturraly called reachability relation. 
Recall that the reachability problem that corresponds to decide if a given couple 
(x,x') is in TZp, was proved to be undecidable for the reset/transfer Petri nets 
[DFS98], a subclass of upward- closed finite linear systems. However, we are going 
to prove that the semi-afSne star hull of the reachability relation is effectively 
computable by using the semi-algorithm 1. 

Proposition 5.2. For any upward-closed finite linear system P, the semi-affine 
hull saS{TZp) is effectively computable and it is equal to [J^^^{{x, Mf .x -\- 
vfffi xGQ^}. 

Proof. Let us consider an upward-closed linear function /. We have just to prove 
that saS{TZ) = {{x,Mf.x-\-Vf)] x G where TZ = {{x, f{x))-, a; GD/}. Let us 
consider the affine function F : Q™ — >■ defined by F{x) = {x, Mf.x-Gvf) 

for any x G Q*”. As TZ = F{Df), lemma 3.1 proves that saff(T^) = F(saff(D/)). 
As Df is not empty, there exists d G Df. Moreover, as Df -\- N™ C Df, we 
deduce d -\- N™ Q Df. Lemma 3.1 proves that safF(fi -I- N™) = d-\- saff(N’”) and 
example 3.2 shows that saff(N’”) = Q™. □ 



Proposition 5.3. For any upward-closed finite linear system P, 
the semi- algorithm 1 terminates on the input saff(7?.p). 
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Proof. Let us consider a non empty finite alphabet S, and a finite sequence 
{fa)a&s of affine functions fa ■ Q™ — f Q'"- For any word ct = oi . . . a„ over S, 
let us consider the affine function fa- naturally defined by fa = fa„ o ■ ■ ■ o fa^ 
if n > 0 and fe{x) = x for any x G Q™ otherwise. The relation TZa is defined 
by TZa = {(x, fa(x)); x G Q™}. Remark that there exists a unique matrix 
Ma G Mm(Q) and a unique vector Va G Q™ such that fa{x) = Ma-x + Va for 
any x G Q™. Vector Va can be easily linked to the sequence {Ma,Va)a&s by 
introducing for any word a = ai . . .a„ and for any i G {0, . . . , n}, the word 
a[i] = Oi+i . . . a„ if z < n and cr[n] = e otherwise. Thanks to this notation, we 
have Va = Ma[i].Vai- We assume that the monoid M = {Ma] a G S*} 

is finite and we denote by k the cardinal of M. We have to prove that semi- 
algorithm 1 terminates on TZq = [Jaesii^^ fa{x))] x G Q"*}. 

For any word a, let us denote by 6^ = {Ma[i]] i G {0,... , |cr|}} and 
let Wa be the vector space generated by the vectors M.v^ where M G Ca 
and w G is such that M.M^, = M. We can now define the relation 

^' = + ({0} X Wa)). 

We first prove that saff*(7^) C TZ' by proving that TZ' is transitive and semi- 
affine. 

In order to prove the transitivity, let x, y, y' G Q™ such that {x, y) G TZa^ + 
({0} X Wai)) and {y,y') G TZa-, + ({0} x Wa^))- We have y = faA^) + 
where v\ G and y' = fa^iv) + ^2 where V 2 G Wa 2 - In particular y' = 

faia 2 {x) + Ma^Ai) + v^. However, by construction of Wa, we have Ma^iWa-f) + 
Wa 2 Q Waia 2 - In particular Ma^ivi) + V 2 G Wa^a 2 Emd we have proved that 
y' G faia 2 {x) + Wa^a 2 - We have proved that TZ' is transitive. 

To show that TZ' is semi-affine, we are going to prove that for any word a such 
that \a\ > there exists a word a' such that |cr'| < \a\ and TZa + ({0} x Wa) C 
TZa' + ({0} X Wa'). 

Let us prove that there exists j < f such that j' — j < k, May] = May] 
and Qa]j] = As the sequence (C,T[fc.i])o<i<fc is a decreasing sequence of sets 

that contain at least 1 element and at most k elements, there exists 1 < z < fc 
such that Ga]k.{i-i)] = Ccr[fc,i]. Moreover, as |M| = k, there exists j,f such that 
k.{i — 1) < j < f < k.i and May] = May]. Now, remark that f — j < k, 

^a-y] = ^a]j'] 3'nd Qa]j] = ’ 

As j < f, there exists a word w G such that a[j] = w{a[j'\). Let w' be 

the word such that cr = w'{(j[j]). Let us prove that we can consider a' = w{a[j'\). 
As Ca]j] = we have &a = 6<t' and in particular Wa = Wa'. Moreover, from 

Ma]j] = May], we deduce May]Ma, = May]. From May] G we deduce 
Ma]j'].Vu] G Wa'. Now, just remark that for any x G Q™, we have fa{x) = 
fw' .w.(a]j']){,x) — M^ay](,Ma,.f-w'(,x) v-w) Xa]j'] — fa'ix) May].v-a) . We have 
proved TZa + ({0} x Wa) Q TZa' + ({0} x Wa'). 

Therefore, TZ' is semi-affine. 

As TZ' is a reflexive and transitive semi-affine relation that contains TZq, by 
minimality of the semi-affine star hull, we deduce saff*(7?.o) C TZ'. 
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Now, remark that any word a € S* can be decomposed into a = ao ■■ ■ <Jm 
such that there exists a sequence {wj)i<j<rn of words in satisfying 

and such that Wa = Let us 

prove that we have the following equality: 

n, + ({0} X W,) = 7Z,,.saS(7Z*^J.7Z,, . . . safr(7^;^).7^,^ (1) 

Let j € {1, ■ • • ,m} and let us consider the affine function Fj : Q'" x Q™ — >■ 
Qm ^ Qm defined by Fj{x,x') = {x, f„j...cr„,{x')). As 
we have = fa^...aAx) + for any x £ 

and n G N. Hence, Fj{x, f^.{x')) = Fj{x,x') + for any 

{x,x') G Q™ X Q™. Lemma 3.1 proves that saff({J^j(a:, (x')); n > 0; x,x' £ 
Q-}) = f,(Q™ X + and 

saS{{Fj{x,f^.{x'))] n > 0; x,x' £ Q™}) = Fj(saff({(x, (x'); n > 0; x,x' £ 
Q™} = saff(7^^^.).7^,^^..„,J^. We have proved that = 

saS(TZl^^).Fa-j...(Tm- immediate induction proves the equality (1). 

Finally, let us consider the semi-affine relation TZi where i > 1 is equal to the 
relation TZ in line 6 at its ith execution. Let zq > ln 2 (fc^). By construction, for 
any z > zq, we have saff*(7^.u,) C TZi for any word w £ and TZcr C TZi for any 
a such that |cr| < k^. Let zi > ln 2 (m) and now remark that for any z > zq + zi, 
and for any word a such that |cr| < k"^ , we have 7?.cr + ({0} x W„) C TZi. Therefore, 
TZ' CTZi- From TZi C saff*(7^o) and saff*(7^o) ^ 'kZ', we deduce TZi = saff*(7^o)- 
In particular, after at most Zq + zi times line 6 is executed, the condition in line 
5 is no longer true and the semi-algorithm terminates. □ 



Remark 5.2. We can bound the time taken by the semi-algorithm 1 to compute 
saff*(7^p) exponentially in the size of safF(T^p) and the cardinal of the monoid 
Mp. Recall that an immediate corollary of [MS77], enables to bound the cardinal 
of Mp by an exponential function that only depends on m. 

The following lemma is important because it proves that if we can effectively 
compute the semi-affine hull of a relation TZ and if we can also compute the 
semi-affine star hull of this semi-affine relation, then, in fact we can compute the 
semi-affine star hull of TZ* . 

Lemma 5.1. For any relation TZ over Q'", we have saff*(7^*) = saff*(saff(77.)). 

Proof. From TZ C TZ* C saff*(77.*), we deduce by minimality of the semi-affine 
hull the inclusion saff(77.) C saff*(77.*). By minimality of the semi-affine star hull 
we get safT*(saff(77.)) C saff*(77.*). Moreover, as 77. C saff(77) C saff*(saff(77)) 
and saff*(saff(77)) is reflexive and transitive, we deduce TZ* C safT*(saff(77)). By 
minimality of the semi-affine star hull, we obtain the other inclusion saff*(77*) C 
saff*(saff(77)). □ 

Corollary 5.1. The semi-affine star hull of the reachability relation of an 
upward-closed finite linear system is effectively computable. 
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Proof. Let us consider an upward-closed finite linear system P. Corollary 
5.2 proves that sa,S{TZp) is computable. Proposition 5.3 proves that the 
semi-algorithm 1 terminates on saff(T^p) and it computes saff*(saff(7^p)). 
Lemma 5.1 shows that this semi-affine relation is equal to saff*(7^p). □ 



6 Applications and Implementations 

In our symbolic model-checker Fast [BFLP03], a plug-in that computes the 
semi-affine star hull of an affine relation has been implemented. Moreover, in 
order to experiment the semi-algorithm 1 over the 40 counters systems pro- 
vided on the FAST-homepage [Fas], we have implemented an algorithm that 
computes the semi-affine hull of a set represented by a Number Decision Di- 
agram (NDD) [WB00,Ler04], the internal symbolic representation used by the 
tools Fast and Lash [Las]. All the details concerning this implementation can 
be found in [Ler03]. 

Semi-algorithm 1 terminates on the computation of the semi-affine star hull 
of the reachability relation of almost all the 40 systems. On some cases the 
computation was not possible because the reachability relation was finite but very 
large. In fact, in this case, example 3.1 shows that the number of components of 
the semi-affine star hull is equal to the number of elements in the reachability 
relation. Some new symbolic representations for semi-affine spaces have to be 
developped in order to scale with this state space explosion problem. 

Rather than providing a long tabular with the semi-affine star hull of the 
reachability relation of the 40 systems we have considered, we illustrate our 
method on one of this system, the MESI protocol, a simple cache coherence 
protocol studied in [EN98]. 



The MESI protocol is an upward-closed finite linear system P = {S, {fa)a&s) 
that uses m = 4 counters and [A] = 3 actions S = { 01 , 02 , 03 }. Vectors in 
Q"* are denoted by {xmo,Xex,Xsh,Xinv) (the names of the variable intuitively 
corresponds to number of memory lines in the states: modified, exclusive, shared 
and invalidate). We have: 



- £>ai = {x G X^nv > 1} 

and fai iXnrio, Xexj Xsh^ Xinv) — (0i Oj Xgh 4“ X^x 4“ X^ao 4- 1, Xij 

- Da^ = {cc G N'^ Xex > 1} 

and fai iXnrio, Xexj Xgh^ Xiny) — {Xmo 4- 1 , X^x I 5 Xghi Xinv)- 

{X G 14 Xgfi P 1 or X^yiy ^ 1} 

and fai {Xyyio^ Xexj Xsh, Xinv) — (0i Ij 0; Xmo 4- Xqx 4- Xgfi 4- Xiy 



I). 

I). 



The set of initial states of the MESI is given by Xq = {(0, 0, 0, Xi„y )j Xinv G M}. 
We want to check that for any {xq,x') G TZ*p such that xq G Xq, we have 
^mo 4- < 1. Let us denote by TZbad the relation corresponding to this 

property and defined by {x„io,Xex,Xsh,Xinv)T^bad{x'mo^x'^,^,x'^f^,xi^^) if and 
only if {xmo, Xex, Xsh, Xinv) G Xq and x'nio 4- x'ax > 2. We just have to check that 
TZ*p n TZbad = 0- 
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By computing the semi-affine star hull of the reachability relation of P, we 
obtain the following semi-affine relation: 





(^mo 


7 ^ex 1 • 


'^shi 


(^p)(^mo? ^ext ^ shi ^inv) 










if and only if 
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^sh — 0 A Xfjio ^ex ^sh ^inv — 


x^. 
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+ 1 ) 



From saff*(7?.p) fl TZbad = 0 we deduce the property TZ*p fl TZbad = 0 - However, 
the computation of the place invariants (the least reflexive and transitive 
affine relation that contains TZ*p, see [Ler04]) just provides the relation 
x'mo + ^'ex + ^'sh + Anv = ^rao + P Xgh + Xinv which is not Sufficient for 
proving the property. This problem provides from the fact that place invariants 
provide a large over-aproximation of the reachability relation of a numerical 
system N as soon as there exists an action a G P such that TZa is “reset / transfer 
transition” . When the numerical system N does not use such a relation, in 
practice, the semi-affine star hull of the reachability relation is equal to 
the least reflexive and transitive affine relation that contains TZ%- This is for 
instance the case for any Petri Net (see [Ler03]). 
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Abstract. This paper deals with equivalence checking of high-level 
hardware design descriptions. For this purpose, we propose a method 
based on validity checking of the quantifier-free first-order logic with 
equality, combined with boolean validity checking. Since, in the first- 
order logic, arithmetic functions or inequalities can be abstractly repre- 
sented by function or predicate symbols, its validity checking becomes 
more efficient than boolean-based one. The logic, however, ignores the 
original semantics of the function or predicate symbols. For example, the 
commutative law over addition cannot be handled. As a result, the ver- 
ification results are often ‘false negative’. To overcome this difficulty, we 
propose an algorithm based on replacement of the function or predicate 
symbols with vectors of boolean formulas. To avoid replacing the entire 
original first-order formula with boolean functions, the algorithm utilizes 
counterexamples obtained from validity checking for the first-order logic. 
This paper also reports a preliminary experimental result for equivalence 
checking of high-level descriptions by a prototype implementation of the 
proposed algorithm. 



1 Introduction 

Recent progress in integrated circuit design technology has enabled implementa- 
tion of very large logic systems on a single chip so that their design verification 
requires more of time and resources. Meanwhile, severe time-to-market goals re- 
quire the verification process to be much shorter. To achieve completeness of 
verification results, formal methods for gate-level descriptions have been widely 
utilized in processes of circuit designs. In particular, equivalence checking of 
combinational circuits, and sequential equivalence checking or model checking 
of sequential designs, have been applied to industrial designs successfully. 

Demands for larger circuit designs has been a driving force for behavioral 
descriptions of designs at early stages of design processes. Formal verification 
tasks for high-level designs, however, cannot be handled with the above boolean 
methods in many cases, because high-level descriptions such as those for digital 
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signal processing tend to contain complex arithmetic operations. To handle such 
descriptions, we can use quantifier-free first-order logic with equality, or first- 
order logic with uninterpreted functions. For this logic, equivalence checking of 
two formulas, or validity checking of a formula has been known to be decidable [2, 
1]. Since, in the logic, arithmetic functions or inequalities can be abstractly 
represented by function or predicate symbols, its validity checking becomes more 
efficient than boolean-based one. In [3], Burch et. al have proposed a verification 
method for pipelined processors with this logic. Several methods for improving 
the performance of verification have been contrived, for example, in [4,5]. Also, 
the validity checking of the logic has been applied to equivalence checking of 
high-level descriptions[6,7,8]. 

The logic, however, ignores the original semantics of the function or predicate 
symbols at the boolean levels. For example, x+y = y+x becomes invalid, because 
the commutative law over addition cannot be handled. The two formulas x < y 
and {x — y) <0 are not regarded as equivalent either. The verification results 
can often be ‘false negative’, and meaningless. 

One method for solving this problem is to give meanings to function and pred- 
icate symbols and to apply theorem proving. General theorem proving, however, 
cannot guarantee the decidability. As a result, it is not appropriate for automa- 
tion. We can use presuberger arithmetic or linear arithmetic for guaranteeing 
termination of the decision procedure, but high-level design descriptions include 
multiplication and division generally. It is difficult to give natural descriptions by 
the above formal systems. Indeed, it is desirable to prove equivalence by theorem 
proving as for as decidabity is guaranteed, but we cannot expect this approach 
to cover all instances. 

On the contrary, as far as high-level design descriptions are concerned, im- 
plemantation as circuits is assumed. It is often the case that various operations 
in the descriprions assume finite bit-length. In this case, there is a possibility 
that (a-|-6)-|-c= a-|-(6-|-c) dose not hold by the limited bit-length. It is required 
to give interpretations at boolean level to handle the cases like this. 

To overcome the above difficulty, we propose an algorithm based on replace- 
ment of the function or predicate symbols with vectors of boolean formulas. 
Here, we assume that semantics at boolean level is given to function or pred- 
icate symbols. To avoid replacing the entire original first-order formula with 
boolean functions, the algorithm utilizes counterexamples obtained from valid- 
ity checking for first-order logic. 

The similar approach which uses both of the first-order formulas and the 
boolean formulas can be found in [9]. In this method, equivalence checking is 
performed for multiple instances. Instances for which abstracted function or 
predicate symbols cause ‘false negative’ results, are converted to boolean for- 
mulas, and boolean validity checking is performed to them. In our algorithm, 
replacements with boolean formulas are not necessarily performed for the entire 
first-order logic formula. 

This paper also reports an experimental result for equivalence checking of 
high-level descriptions by a prototype implementation of the proposed algorithm. 
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This implementation uses CVC-lite[10] and zchaff[ll] for validity checking of the 
first-order logic formulas with uninterpreted functions and the boolean formulas 
respectively. 

In the following, Section 2 describes the overview of the algorithm. Section 
3 defines the syntax and the semantics of the logic and Section 4 explains the 
algorithm in detail. Section 5 reports a preliminary experimental result. 



2 Overview 

Validity checking for quantifier-free first-order logic with equality ignores the 
semantics of the functions or inequalities and treat them as ‘uninterpreted’. 
For example, suppose that the function symbol / represents addition. Then, 
f{x,y) = f{y,x) becomes true if the commutative law over the operation 
is considered. This formula, however, becomes invalid in validity checking for 
quantifier- free first-order logic with equality. To overcome this difficulty, we 
provide boolean interpretations to function or predicate symbols, and perform 
validity checking under the interpretations. As for the instance described in the 
above, we give the following boolean interpretations of bit-length 2, to the func- 
tional symbol /. Here, we ignore carry bits. 

f(x, y) = {xo Ayo®xi®yi,xo®yo) 

fiV: x) = {yo Axo®yi®xi,yo®xo) 

By performing equivalence checking of boolean formulas corresponding to each 
component, f{x,y) = f{y,x) can be shown. 

As another example, we consider ITE{{x < y),x,y) = ITE{{{x — y) < 
0),x,y), where ITE represents if-then-else, that is, if a is true then it repre- 
sents ti else t 2 - Assume that predicate symbol p< expresses arithmetic com- 
parison < and function symbol /_ expresses arithmetic operation — . This for- 
mula is described as ITE{p^{x,y),x,y) = ITE{p^{f-{x,y),0),x,y). Perform- 
ing validity checking for quantifier-free first order logic with equality to this 
formula, it becomes invalid. The assumptions that cause this formula to be in- 
valid, are p<(x,y) and -'p<(/_(x, y), 0). Asserting these two assumptions and 
transforming this formula, (x = y) is obtained. For arbitrary interpretation 
for p<, /_, X and y, the formula p<^{x,y) A -'p<(/_(a;, y), 0) (a; = y) is 

not always true, and thus ITE{{x < y),x,y) = ITE{{{x — y) < 0),x,y) is 
not regarded as valid. In the proposed algorithm, we provide interpretations 
by boolean expressions, and perform validity checking. Providing and /_ 
with the boolean semantics as inequality and subtraction of finite bit-length re- 
spectively, p<(x,y) A -'p<(/_(a;, y), 0) => (a; = y) becomes valid. Then, giving 
p<(a:,y) A -ip<(/_(a;, y), 0) => (a; = y) as a new assumption, validity check- 
ing for quantifier-free first-order logic with equality is performed again for the 
original formula. This procedure is repeated until the validity of the formula is 
determined. The outline of this algorithm is shown in Fig. 1. The detail of the 
algorithm is described in Section 4. 
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Fig. 1. Outline of the algorithm 



3 First-Order Logic with Uninterpreted Functions 

In this section, we define quantifier-free first-order logic with equality, or first- 
order logic with uninterpreted functions. 



Syntax 

The syntax is composed of terms and formulas. Terms are defined recursively as 
follows. Domain variables x,y, z, - ■ ■ and domain constants a,b,c - ■ ■ are terms. If 
h) ^2) • • • ) are all terms, then /(h, ^2; • • • , U) is a term, where / is a function 
symbol of order n (n > 0) . The order means how many arguments can be taken 
by the function symbol. For a formula a and terms ti and t2, ITE{a, is a 
term. 

Formulas are defined recursively as follows, true and false are formulas. 
Boolean variables are formulas. • • • , tn) is a formula, where p is a predi- 

cate symbol of order n (n > 0). An equation ti = <2 is a formula. If ai and 02 
are formulas, then the negation -<ai is a formula, and a\ o 02 is also a formula 
for any binary boolean operator o. 
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Semantics 

For a nonempty domain T> and an interpretation X of function symbols and 
predicate symbols, the truth of a formula is defined. The interpretation X maps 
a function symbol (resp. predicate symbol) of order fc to a function T>^ — >■ V 
(resp. — >• {true, false}). Also X assigns each domain variable and each con- 

stant to an element in V. We restrict the interpretation to constants so that 
values assigned to distinct constants differ from each other. Boolean variables 
are assigned to {true, false}. 

Valuations of a term t and a formula a, denoted by X(t) and X(a) respectively, 
are defined as follows. For a term t = f(ti,t2,---,t„), X(t) = X(f)(X(ti),X(t2), 
• • • , X(tn)). For a formula t = ITE{a, <1,^2), T(t) = X{ti) if X{a) is true, oth- 
erwise X{t) = X{t2). For a formula a = p{ti,t2, ■■■, tn), X{a) = X{p){X{ti), 
X(f2), ■■■, X(f„)). For an equation a = {ti = t2), X{a) = true if and only 
if X{ti) = X{t2). For a = -lOi or a = ai o a2 for a boolean operator o, 
X(a) = -'X(ai) and 1 (a) =X(ai) 01(02) respectively. 

Validity. A formula a is valid if and only if X(a) is true for any interpretation 
X and any domain 2 ?. 

4 Validity Checking Algorithm 

Algorithm 

In the following algorithm, we use validity checking for quantifier-free first-order 
logic with equality and validity checking for boolean formulas. Fig. 1 shows the 
overall structure of the algorithm. 

The detail of the proposed algorithm is shown as procedure Validity -Checker 
in Fig. 2 . Inputs are a set of assumptions A = {Ai, A2, ..., A„}, which is initially 
an empty set, the mapping h, which associates each of predicate and function 
symbols with a boolean function and vectors of boolean functions respectively, 
and a quantifier-free first-order formula G. This algorithm determines whether 

n 

the formula is valid under A, or more precisely, whether G is valid or 

i=l 

not. 

This procedure performs validity checking for quantifier-free first-order logic 
with equality by procedure Check-ViVme 2 ). This procedure returns VALID if 
the formula G is valid. Then, Validity -Checker returns ‘valid’ and terminates. 
Otherwise, it returns a set of assumptions B = [Bx,B2, ..., Bm}, which cause G 
to be invalid. For example, suppose that the formula G is ITE{p^{x,y),x,y) = 
ITE{p^{f-{x, y), 0 ), X, y), where predicate symbol p< expresses arithmetic com- 
parison < and function symbol /_ expresses arithmetic operation — . Then 
Check-V returns assumptions B = {p<(x, y), -'p<(/_(a;, j/), 0 )}, which implies 
G is invalid as first-order logic. Next, the formula G is simplified under the as- 
sumptions B = {Bi, B2, ..., Bm} by procedure Trans form{m Fig. 2 ). This trans- 
formation derives the ‘core’ formula which cannot be determined as equivalent. 
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As for the above example, Trans for m{G, B) returns x = y. Let the resulting 

m 

formula of this transformation be G'. ^ We convert =k G' to a boolean for- 

i—1 

mula, and check its validity by procedure C heck TTF (line 5 in Fig. 2). Procedure 

m 

Check JTF in Fig. 3 replaces l\^ Bi ^ G' with boolean formulas, and checks 

■i—1 

its validity. Procedure Bool-Checker in Check-TF performs validity checking 
for a boolean formula. Procedure Check-TF returns VALID if it is valid(line 
5 in Fig. 3). Otherwise, it returns INVALID(line 6 in Fig.3). If the boolean for- 

m 

mula obtained by the replacement becomes valid, ^ Bi ^ G' is added as a 

i—1 

new assumption to A(line 6 in Fig. 2), and validity checking is repeated. If the 
boolean formula is invalid, the input formula can be determined as ‘invalid’, 

m 

because C under ^ B^ is certainly a counterexample. We show how to replace 

i—1 

a quantifier-free first-order logic formula by a boolean formula in the following. 
This is performed by Boolean : Check-TF. We assume that ti,t 2 , are 
terms, / is a function symbol, p is a predicate symbol, and a is a logic formula. 
Terms or predicates are converted to boolean formulas or vectors of boolean 
formulas respectively. The following mapping FI ■. F ^ (3 is the conversion pro- 
cedure, where /3 is a set of vectors of boolean formulas and F’ is a set of logic 
formulas. Recall that the mapping h associates each of predicate and function 
symbols with a boolean function and vectors of boolean functions respectively, 
and a quantifier-free first-order formula G. 



[Procedure] Replacement by Boolean Formulas 

H(F) is defined recursively as follows. We assume that each function or predicate 
symbol is associated with a boolean formula or a vector of boolean formulas. 

• H(F) = (ci,C2, ■■■,Cs) where Ci G {true, false}, if F is constant. 

• H(F) = (6i, 62, bs) where bi is boolean variable, if F is variable. 

• H{F) = h(f){H(ti),H(t2),.:,H(tn)), if F is /(G, ^2, fn)- 

• H(F) = h(p){H(ti),H(t2),...,H(tn)), if F is p(ti, G, in)- 

• H(F) = {H(h) = H(h)), if F is G = G- 

• H(F) = H(Fi) o H(F2), if F is Fi o F2. 

• H(F) = -^H(Fi) if F, is -Fi. 

• H(F) = {H(a)AH(ti)) V (iF(a) AF(G)), if F is /TF(a,G,G)- 



^ In our implementation, B is a set of formulas returned by CVC-lite as a set of 
assertions which cause G to be invalid. Also, the transformation of G under B is 
performed by command ‘TRANSFOPRM’ of CVC-lite. 
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INPUT A — {Ai, A 2 , ■■■, An}, G ; logic formula ,h\S^(3, 
where S' is a set of function or predicate symbols used in G, 
/? is a set of vectors of boolean formulas. 

OUTPUT j\^Ai ^ G becomes valid or not under h. 

i 

1 ValidityAJhecker{A,G,h){ 

2 while{{B = Gheck_V{A, G))! = '’VALID”) 

3 do begin 

4 G' = TransformiG, B); 

5 if{Gheck.TF{B, G' , h) == ” VALID” ) 

i 

6 then /\B. G' is added to A; 

i = l 

7 else begin 

8 print}” invalid.”); 

9 exit; 

10 end; 

11 end; 

12 print}” valid.”); 

13} 



Fig. 2. Procedure Validity .Checker 



INPUT B = {Bi, B 2 , Bn}, TF dogic formula, h : S ^ (3. 
OUTPUT “VALID”, if validity checking of the resulting formula by 

m 

replacement for ^ Bi TF. Otherwise, “INVALID”. 

i=l 

1 Gheck_TF}B,TF,h){ 

m 

2 GTF := Boolean} !\ Bi ^ TF, h); 

i=l 

m 

3 /* replacement of ^ Bi ^ TF hy boolean formulas. */ 

i=l 

4 if}BooLGhecker}GTF) == “VALID”) 

5 then return}“V ALID"); 

6 else return}“INV ALID”); 

7 } 



Fig. 3. Procedure Check.TF 



Example 

We illustrate this proposed algorithm with the following example. We con- 
sider ITE}p^}x,y),x,y) = ITE}p^}f_}x,y),0),x,y), where predicate sym- 
bol p< expresses arithmetic comparison < and function symbol /_ ex- 
presses arithmetic operation — . Procedure Check-V performs validity check- 
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ing for quantifier-free first-order logic with equality. In this instance, 
ITE{p^{x,y),x,y) = ITE{p^{f-{x,y),Q),x,y) is not valid for quantifier-free 
first-order logic with equality, then procedure Check-V returns assump- 
tions B = {p<(x,y), -■p<(/_(x,y),0)}. The formula ITE{p^{x,y),x,y) = 
ITE{p^{f-{x,y),Q),x,y) is simplified under the assumptions B = {p<(x,j/),-i 
p<(/_(a;, t/), 0)} by procedure Transform. Procedure Transform returns x = 
y. Procedure Boolean replaces the formula p<(x, y) A~< p^{f-{x, y),0) x = y 
by a boolean formula, then Bool-Checker performs validity checking for the 
resulting formula. For this instance, the formula p^{x,y) A -'p<(/_(a;, j/), 0) 

X = y is valid. Then the formula p< (x,y) A ~<p^ (/_ (x, j/) , 0) ^ x = y is added as 
a new assumption, and we obtain {{p<{x,y) A -ip<(/_(x, y), 0) ^ x = y)} and 
{ITE{p^{x, y),x, y) = ITE{p^{f-{x, y), 0), x, y)) as new A and G to procedure 
Check-V. By repeating this procedure, for this particular instance, procedure 
Check-V eventually returns ‘valid’, and the proposed algorithm terminates. 

Termination 

m 

As denoted in the above algorithm, note that ^ Bi ^ G' is added as a assertion 

i—1 

which was determined as invalid for the first-order logic and determined as valid 
with the interpretation of boolean formula. In the next iteration step, validity 
checking for first-order logic tries to find the counterexample, if any, under the 

m 

assumption that Bi is not true. Repeating this procedure, this algorithm finds 

i—1 

a true counterexample along the way, otherwise, checks all cases and outputs 
‘valid’, and terminates. 

Performance 

In this algorithm, transformation to boolean formulas is performed based on 
counterexamples. In the worst-case, it results in replacing the entire formula 
by boolean formulas. If the discrepancy between two terms to be compared for 
equivalence checking is not large, then the number of produced counterexamples 
would be small. In such a case, we can expect that the large portion of the 
original formula is not required to be interpreted at boolean level, and thus, the 
total performance is expected to be better. 

Heuristics 

In order to make this algorithm more efficient, we add the following heuristic 
techniques to this algorithm. Firstly, we prevent redundant assumptions from 

m 

being included. Suppose that ^ Bi ^ G' is added to the original formula as 

i^l 
m 

an assumption. Then, ^ G' itself often appears, at later steps, 

Z=1 



as a new 
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m m 

assumption B[. In repeating this operation, Bi \n ^ G’ becomes large 

m 

drastically. To avoid this problem, we delete ^ Bi ^ G' , if it already exists 

i=l 

among the assumptions that cause a counterexample. 

m 

Secondly, we remove some assumptions from ^ Bi ^ G' . After performing 

m 

validity checking of => G' in line 6 of Fig. 2, we perform boolean validity 

i^l 

m m 

checking for the logic formula ^ Bi ^ G' excluding Bj from l\Bi. If the 

i=l 

m 

result is valid, we can delete Bj from yAy Bi, because it dose not change the 

i—1 

result. Otherwise, Bj is not removed. Here we check B\, B 2 , ..., Sm one by one, 
and delete each candidate when possible. 

5 Implementation and Experimental Results 

In this section, we report a preliminary experimental result for AD- 
PCM(Adaptive Differential Pulse Code Modulation) code written in C, using 
a prototype implementation of the proposed algorithm. This implementation 
uses CVC-lite and zchaff for validity checking of quantifier-free first-order logic 
with equality and boolean formulas respectively. 

ADPCM is a procedure that compresses and outputs input values by en- 
coding difference between the value of each input and the value just before the 
input. We carry out the experiment for the body of the loop structure in the 
code. Fig. 4 shows the code used in the experiment. Input of the experiment is 
a quantifier-free first-order logic formula transformed from the C code by hand. 

We give two source codes with local differences such as ‘delta << 4’ and 
‘(delta << 2) << 2’. Each variable is defined as a variable of first-order logic. 
Arithmetic operations -I- and — are represented by function symbols PLUS and 
MINUS, shift to the right and shift to the left are represented by function 
symbols LSHIFT and RSHIFT respectively. Arithmetic comparisons are rep- 
resented by predicate symbols LESSTHAN, LESSEQU AL, LARGERTHAN 
and LARGEREQU AL and bitwise operations are represented by function sym- 
bols BITWISEAND and BITWISEOR, respectively. Using these function 
and predicate symbols, two source codes are transformed into first-order logic 
formulas. The terms corresponding to ‘outp’ in the two source codes are con- 
nected with = as ^outpi = outp 2 , which is given as an input to the validity 
checker. We give a vector of boolean functions of 4-bit width to each of the func- 
tion symbols, and a boolean function with inputs of 4-bit width to each of the 
predicate symbols, and performed the experiment. In any of the following cases, 
two codes are equivalent. 
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diff = val - valpred; 
sign = (diff < 0) ? 8 : 0; 
if (sign) diff = (-diff) ; 
delta = 0; 
if (diff >= step) { 
delta = 4; 
diff -= step; 

} 

step >>= 1; 
if (diff >= step) { 
delta 1= 2; 
diff -= step; 

} 

step >>= 1; 
if (diff >= step) { 
delta 1= 1; 

} 

delta 1= sign; 
if (bufferstep) { 

outputbuffer = (delta « 4) & OxfO; 

} else •[ 

outp = (delta & OxOf) I outputbuffer; 

} 

Fig. 4. ADPCM code 



Table 1 shows the differences between the two codes and the run-times. # of 
repetitions shows how many times the body of the while loop in Fig. 2 was per- 
formed. We used a PC with Pentium 4(2.4GHz), 2GB of memory. The program 
was written in the C language, compiled with gcc 3.2.2 under RedHat Linux 

m 

9. Hybridl is the run-time when we do not delete l\^ Bi ^ G', even though it 

i—1 

exists in assumptions that cause a counterexample. Hyhrid2 is the run-time of 
the algorithm using the two heuristic methods in the previous section. 

The run-times depend on the differences between the two codes. The num- 
ber of the repetitions decreases if the differences between the two codes after 
transformation are fewer. The run-times for Hybridl are almost always larger 
than those for Hybrid2, because the number of assumptions for Hybridl become 
larger than for Hybrid2. 

6 Concluding Remarks 

For efficient equivalence checking of high-level hardware design descriptions, 
this paper proposes a method based on validity checking of the quantifier-free 
first-order logic with equality, combined with boolean validity checking. When 
function or predicate symbols are replaced with vectors of boolean formulas for 
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Table 1. Experimental results 



Differences of two codes 


# of 

repetitions 


Hyhridl 

(sec) 


Hybrid2 

(sec) 


diff < 0 


0 > diff 


16 


49 


31 


diff >= step 


diff-step >= 0 


16 


32 


20 


diff >= step 
delta & OxOf 


step <= diff 
OxOf & delta 


48 


205 


126 


delta = delta | sign 
delta << 4 


delta = sign | delta 
(delta << 2) << 2 


31 


89 


110 


diff >= step 
delta << 4 

(delta & OxOf) | outputbuffer 


step <= diff 
(delta << 2) << 2 
outputbuffer | (OxOf & delta) 


54 


689 


364 



precise validity checking, the algorithm utilizes counterexamples obtained from 
validity checking for the first-order logic, to avoid replacing the entire original 
first-order formula with boolean functions. We also reported a preliminary exper- 
imental result for equivalence checking of high-level descriptions by a prototype 
implementation of the proposed algorithm. 

Future work includes a sophisticated algorithm for transforming function 
and predicate symbols into boolean functions. In this paper, we transform entire 
first-order logic formulas into boolean formulas, even though each of them are 
simplified by assumptions obtained from counterexamples. We are expecting that 
computational complexity can be improved by transforming only ’key’ portions 
in the formulas into boolean formulas. Finding such portions automatically would 
be a necessary task for this purpose. 

The proposed approach is quite general, because no restriction is imposed 
to operations used in the formulas. On the hand, introducing theorem prov- 
ing techniques, which can handle an inevitably restricted class of formulas for 
guaranteeing decidability, would be effective and efficient, as far as simple rules 
such as commutative law are concerned. Combining the two techniques would 
be useful for more efficient validity checking. 
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Abstract. This paper studies conflicts from a process-algebraic point 
of view and shows how they are related to the testing theory of fair test- 
ing. Conflicts have been introduced in the context of discrete event sys- 
tems, where two concurrent systems are said to be in conflict if they can 
get trapped in a situation where they are waiting or running endlessly, 
forever unable to complete their common task. In order to analyse com- 
plex discrete event systems, conflict-preserving notions of refinement and 
equivalence are needed. This paper characterises an appropriate refine- 
ment, called the conflict preorder, and provides a denotational semantics 
for it. Its relationship to other known process preorders is explored, and 
it is shown to generalise the fair testing preorder in process-algebra for 
reasoning about conflicts in discrete event systems. 



1 Introduction 

Conflicts are a common fault in the design of concurrent programs that can 
be very subtle and hard to detect [8,24]. They have long been studied in the 
field of discrete event systems [5,21,26], which is applied to the modelling of 
complex, safety-critical systems [1,13,14]. In order to improve the reliability of 
such systems, techniques are needed to facilitate the design of systems that are 
free from conflict. 

Two processes are said to be in conflict if they can reach a state from which 
no terminal state can be reached anymore. This includes both the possibility of 
deadlock, where processes are stuck and unable to continue at all, and livelock, 
where processes continue to run forever without achieving any further progress. 

The idea of being in conflict is strongly coupled to a built-in notion of fairness. 
Divergent computations are not treated as problematic — a system is understood 
to be free from conflict if, from every reachable state, all involved processes 
can cooperatively reach a terminal state. Such a concept of possible termination 
with respect to fairness assumptions is very useful for several applications, e.g. 
communication protocols [16,19]. 

In spite of the apparent simplicity of the concept, conflicts are difficult to 
analyse in a modular way [24]. For purposes of model checking [6,7], the property 
of two processes being free from conflict is expressed by a CTL formula such as 

AG EF terminal state , (1) 

F. Wang (Ed.): ATVA 2004, LNCS 3299, pp. 120-134, 2004. 
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where terminaLstate is a propositional formula identifying the states in which 
both processes have terminated with success. This formula is neither in VCTL* 
nor in 3CTL*, which explains why many known abstraction techniques [7] cannot 
be used for this kind of property. 

Several approaches of modelling and refinement have been proposed for dis- 
crete event systems [12,25,24], each suggesting a special way of designing systems 
such that they are free from conflict by construction. While giving some insight 
into how systems can be designed to be free from conflict, these techniques rely 
on interfaces provided by users and therefore cannot be applied automatically. 

Research in process algebra has focused on general ways of composing sys- 
tems, and identifying notions of refinement and equivalence that preserve proper- 
ties of interest [23]. Yet, standard process-algebraic approaches are either based 
on failures [10,22], which consider divergent computations as catastrophic and 
therefore cannot be applied for conflict analysis, or on hisimulation [18], which 
provides a correct but unnecessarily fine distinction. 

The process-algebraic theory of fair testing, which has been developed in- 
dependently by two groups of researchers in [2,3] and [19], provides the formal 
framework needed to characterise conflict-preserving refinements. The present 
paper applies and extends the results about fair testing to the setting of discrete 
event systems, providing a process-algebraic framework for new algorithms for 
conflict analysis. 

Section 2 introduces notations and provides a definition of conflicts. Section 3 
presents the conflict preorder and explores its relationship to fair testing and 
other known semantics. The conflict preorder is shown to be the best possible 
refinement to reason about conflicts. Finally, section 4 contains some concluding 
remarks. 



2 Notation 

This section introduces the notations used throughout this paper. Processes are 
represented as labelled transition systems, with the possibility of nondetermin- 
ism which naturally arises from abstraction and hiding operations [10,18,22]. 
Process behaviour is described using languages, with notations taken from the 
background of discrete event systems theory [5,21]. 

2.1 Languages 

Traces and languages are a simple means to describe process behaviours. Their 
basic building blocks are actions, which are taken from a finite alphabet A. Then 
A* denotes the set of all finite strings or traces of the form a\a 2 • • • of actions 
from A, including the empty trace e. A language over A is any subset CCA*. 

The catenation of two traces s,t € A* is written as st. Strings and languages 
can also be catenated, e.g., sC = {st \ t € C}. The prefix- closure £ of a 
language £ C A* is the set of all prefixes of traces in £, i.e., C ‘^= { s € A* \ st € 
£ for some t G A* }. A language £ is called prefix-closed if £ = £. 
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The special action symbol w ^ A is used to indicate successful termination. 

def 

Then, = A U {ut} denotes the set of actions including ut. The termination 
action only makes sense at the end of an execution, therefore all action sequences 
should belong to A*“ A* U A*w. 



2.2 Processes 

In the context of this paper, it is sufficient to model systems or processes as 
nondeterministic labelled transition systems 

P = , (2) 

where A is the alphabet of actions, Q is the non-empty set of states, — >■ C 
Q X A X Q is the transition relation, G Q is the initial state, and C Q is 
the set of success states. The transition relation is also written as 

q q' if and only if {q, a, g') G — >■ . (3) 

If the transition relation can be described as a function, i.e., if q and q q 2 
always implies qi = q 2 , then P is called deterministic. 

Labelled transition systems, i.e., processes, are represented graphically as 
shown in fig. 1: states are represented as nodes, with the initial state highlighted 
by a thick border and success states shaded grey. The transition relation is 
represented by labelled edges. 

Processes and tests use the set of success states to indicate the possibility 
of successful termination. In order to translate this into an action-based repre- 
sentation, every process is assumed to have a terminal state T G Q \ <5“, from 
which there are no outgoing transitions. Then the transition relation is extended 
to a relation C Q x x Q hy adding transitions 

^ T for each q^ € Q'^ . (4) 

This construction makes it possible to represent termination only by means of 
the termination action u) which, if it occurs, is always the last action of any 
execution. 

The action-labelled transition relation — >■ is further extended to a string- 
labelled transition relation C Q x A*‘^ x Q defined as 

q ^ q for each q G Q ; (5) 

q ^ q' if and only if q ^ q,s q' for some q,s G Q . (6) 

The set of all labelled transitions systems, and thus all processes, with ac- 
tion alphabet A is denoted by TTa. The transition relation is also defined for 
processes, denoting by 






( 7 ) 
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that process P G TTa evolves into P' G 77 a by executing actions s G A*“ . This is 
defined as (A, Q, — >• , q° ,Q'^) =4> (A, Q, — >• , g', Q“) for each q° q' . The notation 

P states that P ^ P' for some P' G 77a- 

The possible behaviours of a process are defined by the set of action sequences 
or traces it can execute. The language C{P) and the success language Ai{P) of 
P G 77a are defined as 

C{P) ll:*' { S G A*“ I P } and M{P) = { s G A*w | P } . (8) 

73(P) contains all complete or incomplete traces that can be executed by a pro- 
cess, including or not including the termination action uj. This is a prefix-closed 
language. In contrast, Ai(P) contains only traces ending with uj, i.e., only those 
traces that lead to successful termination. 

2.3 Synchronous Product 

When multiple processes are running in parallel, lock-step synchronisation in 
the style of [10] is used. The synchronous product Pi || P2 of two processes 
Pi = (A,Qi,^i ,q°,Q‘f) and P2 = (A, Q2, -^>2 , ?2 1 O2 )> both using the action 
alphabet A, is defined as 

Pi II P2 =' (A,gi X X g“) , (9) 

where (91,52) (q'ijq'2) if only if gi q[ and 92 -^2 q'2 for all a G A^^. 

Synchronisation is performed on all actions, including the termination action uj. 
Two processes can only terminate together when both are ready to terminate. 

2.4 Conflicts 

Given a process P G Pa, it is desirable that every trace in 73 (P) can be com- 
pleted to a trace in A7(P), otherwise P may become unable to terminate. In 
discrete event systems theory, a process that may become unable to terminate is 
called blocking. This concept becomes more interesting when multiple processes 
are running in parallel — in this case the term conflicting is used instead. The 
following extends the standard definitions [21] to the case of nondeterministic 
processes. 

Definition 1. A process P G Pa is said to be nonblocking, if for every trace 
s G A* and every P' G Pa such that P ^ P' there exists a continuation t G A*uj 
such that P' 4>. Otherwise P is said to be blocking. 

Definition 2. Two processes Pi , P2 G Pa are said to be nonconflicting if Pi || P2 
is nonblocking. Otherwise they are said to be conflicting. 

These definitions are based on an implicit fairness assumption: the possibility 
of divergence is not considered as a problem. Process P2 in fig. 1, e.g., is non- 
blocking, although it can theoretically execute an infinite sequence of a actions. 
In order to be nonblocking, or nonconfiicting, it is sufficient that a terminal state 
can be reached in every possible situation. 
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Fig. 1. Example processes. 



3 The Conflict Preorder 

3.1 Finding Conflicts in Large Systems 

An aim of this paper is to provide the framework for algorithms to determine 
whether a large system of concurrent processes is conflicting or not. The straight- 
forward approach to check whether processes Pi, P 2 , . ■ . , Pn are conflicting is to 
construct their synchronous product 



Pl\\P2\\---\\Pn ( 10 ) 

and check whether it is blocking. This is done by checking whether a terminal 
state can be reached from every reachable state. Using symbolic representations 
such as BDDs [4] or IDDs [27], this approach has been used to analyse very large 
models. Yet, the technique always remains limited by the amount of memory 
available to store representations of the synchronous product. 

An alternative approach avoids building the entire synchronous product by 
analysing smaller subsystems first. Modular reasoning can make it possible to 
replace the process Pi, e.g., by a simpler version P[, and analyse the simpler 
system 

P[\\P2\\---\\Pn- ( 11 ) 

Such modular reasoning requires an appropriate notion of process equiva- 
lence. For the sake of conflict analysis, processes Pi and P[ in fig. 1 are equiva- 
lent, since any other process T that is nonconflicting with either Pi and P{ must 
be able to execute a and then be able to continue both with [3 and with 7 . On 
the other hand, processes P 2 and P 2 in fig. 1 are not equivalent. A process T 
that can execute any number of a actions followed by /3 is nonconflicting with P 2 
but not with 

Such a notion of equivalence can be obtained using the process-algebraic 
framework of testing [20, 9] by considering conflicts as a testing paradigm. Sec- 
tion 3.2 formally introduces the notion of conflict equivalence. Section 3.3 dis- 
cusses its congruence properties and shows that it is the best possible equivalence 
to reason about conflicts. Section 3.4 provides a denotational characterisation, 
and sections 3.5 and 3.6 discuss the relationship to other process semantics. 
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3.2 Using Conflicts as a Testing Paradigm 

The traditional testing framework [20,9] defines preorders and equivalences that 
relate processes based on their responses to tests. This setting can be applied 
to obtain an equivalence based on conflicts: a test can be any process, and the 
test’s response is the observation whether the test is conflicting with the given 
process or not. Then two processes are equivalent if the responses of all tests are 
equal. 

Deflnition 3. Let Pi,P 2 G II a- 

— Pi is less conflicting than P 2 , written Pi <conf P 2 , if for every test T G II a, 
if P 2 and T are nonconflicting, then Pi and T are also nonconflicting. 

— Pi and P 2 are conflict equivalent, written Pi ~conf P 2 , if Pi ^conf P 2 and 

P 2 ^conf Pi- 

For example, processes Pi and P{ in fig. 1 are conflict equivalent, while P 2 
and P 2 are not. 

The conflict preorder <conf is closely related to the fair testing preorder [2, 
19]. The above deflnition is based on the assumption that a process P passes 
a test T if and only if P and T are nonconflicting; this idea is called should 
testing and written as P should T in [2]. The deflnition given here is more 
flexible, because processes and tests synchronise on the termination action oj. 
In traditional testing scenarios, success is determined solely by the test, which 
makes it impossible to describe blocking processes, e.g., directly. 

Given a preorder based on tests such as <conf, it is of interest [9] to reduce 
the set of tests needed to establish Pi <conf P 2 • The following proposition shows 
that it is sufficient to consider deterministic tests only. 

Deflnition 4. Let Pi,P 2 G II a- Write Pi P 2 , if for every deterministic 

test T G ITa, if P 2 and T are nonconflicting, then Pi and T are also nonconflict- 
ing. 

Proposition 1. Let Pi, P 2 G TTa. If Pi P 2 then Pi ^conf P2- 

Proof (sketch). Let Pi P 2 , and assume that Pi <conf P 2 does not hold. 

Then there exists a test T G II a such that P 2 and T are nonconflicting, but Pi 
and T are conflicting. Since Pi and T are conflicting, there exists a trace s G A* 
such that Pi ]j T P{ ]j T' , but there does not exist any t G A*ui such that 
P{ jj T' 4>. Then construct a deterministic process G II a such that 

^(j,det) ^ ^ . (^2) 

_y^(ydet) ^ (^m{T) \ sA*) U sTW(T') . (13) 

A nondeterministic acceptor of these languages is directly obtained by modi- 
fying T appropriately; then subset construction [11] is used to make it deter- 
ministic. Now it can be proven that P 2 and are nonconflicting, since P 2 
and T are, but Pi and T'^®* are conflicting. This contradicts the assumption 

Pi P2- □ 
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3.3 Congruence Properties 

When studying process preorders such as <conf, it is an important question how 
these preorders behave when processes are modified or combined by standard 
operations. It is desirable that relationships between two processes are preserved 
when the same operation is applied to both processes. A preorder that satisfies 
this condition is called a pre- congruence. 

The congruence properties of fair testing have been studied in [2]. All these 
results can be extended for the conflict preorder. Below is a proof for the con- 
gruence result with respect to synchronous composition, which is simple in the 
terminology of conflicts. 

Definition 5. Let C Ha X 77a be a preorder on the set of processes. 

~ < is a pre- congruence with respect to || if, for all processes Pi,P2,T G II a 
such that Pi < P2, it follows that Pi\\T < P2 || T. 

— < respects blocking if, for all processes Pi, P2 G 77a such that Pi < P2, if P2 
is nonblocking then Pi also is nonblocking. 

Proposition 2. <conf is a pre-congruence with respect to ||. 

Proof. Let Pi <conf P 2 and T G 77a. To see that P\ \\T <conf P 2 \\ T, let 
T' G 77a be a test such that P 2 || T and T' are nonconfiicting. Then P 2 || P || T' is 
nonblocking or, equivalently, P 2 and P|| T' are nonconfiicting. Since Pi <conf P 2 
it follows that Pi and T || T' are nonconfiicting, i.e., Pi || T || T' is nonblocking, 
i.e., Pi II T and T' are nonconfiicting. Thus Pi || T <conf P 2 i| T. □ 

The conflict preorder <conf also is a pre-congruence with respect to other 
process-algebraic operators, including generalised forms of parallel composition, 
hiding, prefixing, and renaming. To have a pre-congruence with respect to hid- 
ing [22] is particularly important, because it means that this standard method 
can be used to simplify processes in a conflict-preserving way. 

The conflict preorder can be characterised as the coarsest pre-congruence 
with respect to synchronous composition that respects blocking. In other words, 
any process equality that distinguishes processes according to their blocking 
behaviour and preserves synchronous composition is contained in conflict equiv- 
alence. This means that the conflict preorder is the best possible process refine- 
ment for reasoning about conflicts. 

Proposition 3. <conf respects blocking. 

Proof. Note that there exists a process Ua G TTa such that P || Ua = P for 
every P G II a. Let Pi <conf P2, and let P2 be nonblocking. Then P2 and Ua are 
nonconfiicting. Since Pi <conf P2, it follows that Pi and Ua are nonconfiicting, 
i.e., Pi II Ua = Pi is nonblocking. □ 

Proposition 4. Let < C 77a x L7a be a pre-congruence with respect to || which 
respects blocking. Then Pi < P2 implies Pi <conf P2- 
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Proof. Let Pi < P 2 , and let T G 77a such that P 2 and T are nonconflicting. 
Then P 2 \\ T is nonblocking. Furthermore, Pi || T < P 2 || ”7 since < is a pre- 
congruence with respect to ||. Since < respects blocking it follows that Pi || T is 
nonblocking, i.e.. Pi and T are nonconflicting. Hence Pi ;$conf P 2 - n 

3.4 The Nonconflicting Completion Semantics 

In addition to the deflnition of a preorder by referring to arbitrary test envi- 
ronments, it is desirable to have a denotational characterisation based on the 
structural properties of a process. Such characterisations have been introduced 
for fair testing in [2,3, 19], but they are complicated and hard to relate to tradi- 
tional characterisations such as failures [10,23]. This section proposes an elegant 
characterisation of the conflict preorder, called the nonconflicting completion 
semantics, which has an intuitive interpretation based on the idea of conflicts. 

The idea of the nonconflicting completion semantics is to list for each pro- 
cess P a set of conditions that have to be satisfied by a test T that is to be 
nonconflicting with P. Assume that such a test accepts a trace s that is also 
accepted by P. In order to be nonconflicting with P, such a test will be required 
to be able to terminate in certain ways. Each set of possibilities is combined as a 
language of possible completions, called a nonconflicting completion for s in P. 

Deflnition 6. Let P G Pa be a process, and let s G A*. A language C C A*oj 
is called a nonconflicting completion for s in P, if for every test T G Pa such 
that P and T are nonconflicting and T ^ T' , there exists a completion t G C 

such that T' 4>. 



Deflnition 7. The nonconflicting completion semantics of P G Pa is 

CC(P) =‘' { (s,C) G A* X P(A*a;) | (14) 

C is a nonconflicting completion for s in P } . 

This construction is similar in style to failures semantics [10]. But here, each 
trace is paired with sets of successful completions, namely with all its noncon- 
flicting completions. Each pair (s,C) G CC(P) describes a condition that needs 
to be satisfied by a test that is to be nonconflicting with P. Such a test, if it can 
accept s, must afterwards always be able to terminate with at least one of the 
traces in C. 

Example 1. Consider process Pi in fig. 1, and assume that a test T running in 
parallel with P\ accepts action a. In order to be nonconflicting with Pi, the 
test T, after having executed a, must be able to accept /7 and then terminate. 
This is because, after execution of a, process P\ may be in a state from which 
the only possibility to terminate is j3. Therefore, the nonconflicting completion 
semantics CC(Pi) of Pi contains the pair (a, {/3a;}). For similar reasons, it con- 
tains the pair (a, { 70 ;}). 
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Fig. 2. Example with a not well-founded set of nonconflicting completions. 



Example 2. Consider process P 2 in fig. 1, and let T be a test accepting a. To 
be nonconflicting with P 2 , T must be able to terminate after at least one of the 
traces P, 07, aaP , . . and at least one of the traces 7, a/?, aaj, . . . Therefore, 
CC(P2) contains {a,{{aa)* Pu),{aa)*a'^u}'\) and (a, {(o;a)*7W, (aa;)*Q;/3a;}). 

Consider a process P € II a such that P ^ P' . Then any test T that is non- 
conflicting with P, after having executed s, must be able to terminate together 
with P' . Thus the set of all possible ways in which P' can terminate, i.e., the 
success language of P' , forms a nonconflicting completion for s in P. This result 
is easy to prove. 

Lemma 5. Let P,P' G II a and s G A* such that P P' . Then (s, A4(P')) G 

CC(P). 

Proof. It is to be proven that A4(P') is a nonconflicting completion for s in P. 
Let T G IIa be nonconflicting with P, and T ^ T' . Then P || T P' || T'. 
Since P and T are nonconflicting, there exists a completion t G A*uj such that 
pf II rp! Hence T' 4>, i.e., t G M{P') by definition of M{P'). □ 

A process can have many more nonconflicting completions. The nonconflict- 
ing completion semantics is upward closed, i.e., if (s,C) G CC(P), then it im- 
mediately follows that (s,C') G CC(P) for any language C' G) C. The following 
example shows that a process can have other nonconflicting completions still. 

Example 3. In order to be nonconflicting with process P3 in fig. 2, a test must 
initially be able to accept at least one of the traces a/3, aa/3, aaa/3, . . . Therefore, 
any such test must be able to execute a in its initial state. However, any test 
executing a must also be able to cope with P3 being put back to its initial state 
by executing the selfloop in the initial state. Therefore, such a test also has to 
accept at least one of the traces aa/3, aaa/3, aaaa/3, ... in its initial state. It 
follows that CC(P3) contains all the pairs (e, {a”a*/3w}) for n > 1. 

This example shows that nonconflicting completions can be proper subsets 
of the success languages of subprocesses. In addition, it points out that, given a 
trace s, there does not necessarily exist a minimal nonconflicting completion. 

The nonconflicting completion semantics is an accurate characterisation of 
the conflict preorder. The following result shows that, for all processes, being 
less conflicting is equivalent to having fewer conditions in their nonconflicting 
completion semantics. It follows that two processes are conflict equivalent if and 
only if their nonconflicting completion semantics coincide. 
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Theorem 6. Let P\,P 2 & Pa- Then Pi <conf P 2 if and only if CC(Pi) C 

CC(P2). 

Proof. First, assume that Pi <conf P 2 , and let (s,C) G CC(Pi). Then C C A*w 
is a nonconflicting completion for s in Pi. To prove that C is a nonconflicting 
completion for s in P 2 , let T G II a be nonconflicting with P 2 , and T ^ T' . Since 
P\ ^conf P 2 , it follows that Pi and T are nonconflicting. Since T ^ T' and C 
is a nonconflicting completion for s in Pi, there exists a completion t € C such 
that T' Therefore C is a nonconflicting completion for s in P 2 . 

Second, assume that CC(Pi) C CC(P 2 ), and let T G Pa be nonconflicting 
with P 2 . To prove that Pi and T are nonconflicting, let s G A* such that Pi ||T 
Q. Then there exist processes P[,T' G Pa such that Pi P{, T T', and 
Q = P[\\T' . By lemma 5 it follows that 

{s,M{P'i)) G CC(Pi) C CC(P 2 ) , (15) 

i.e., A4(P{) is a nonconflicting completion for s in P 2 . Then, since P 2 and T 
are nonconflicting and T T', there exists a completion t G A4{P{) such that 
T' 4>. By definition of Ai{P[), it follows that t G A*co and P( 4>. This means 
Q = P{\\ T' 4>, i.e.. Pi and T are nonconflicting. □ 

3.5 The Set of Certain Conflicts 

Every process can be associated with a language of certain conflicts, which char- 
acterises its potential to cause conflicts in larger contexts. This language is stud- 
ied from the viewpoint of discrete event systems in [15]. It can be used to simplify 
processes in a conflict-preserving way, and to detect conflicts in large systems 
without constructing their synchronous products explicitly. In the following, the 
set of certain conflicts is introduced in an alternative way, using the nonconflict- 
ing completion semantics. 

If (s,0) G CC(P) for some s G A*, then any test capable of accepting s and 
being nonconflicting with P has to be able to continue with a trace from 0. Such 
a test cannot exist. In other words, every test that can accept s is necessarily 
conflicting with P. 

Definition 8. Let P G Pa- Then define 

Conf(P) { s G a* I (s, 0) G CC(P) } ; (16) 

NConf(P) { s g a* I (s, 0) CC(P) } . (17) 

The language Conf(P) is called the set of certain conflicts of a process P, 
because it contains all those traces that necessarily lead to a conflict when ac- 
cepted by a test running together with P. Its complement NConf(P) contains 
all those traces that are accepted by some test which is nonconflicting with P. 

For nonblocking processes, the set of certain conflicts obviously is empty. 
Yet, the concept becomes useful and interesting when blocking processes are 
considered. 
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Example 4- Consider the processes in fig. 3. Any test that initially executes (3 
is conflicting with P4, and likewise with P[. Therefore Conf(P 4) = /3A*. The 
situation is similar for P5, where any test initially executing a is in conflict. But 
actually any test that is to be nonconflicting with P5 must initially accept a to 
make P5 reach its only marked state. Therefore Conf(P5) = A*. 

An immediate consequence of the definition is that the set Conf(P) of cer- 
tain conflicts decreases when a process becomes less conflicting, and that the 
set NConf(P) increases at the same time. 

Proposition 7. Let Pi,P 2 G TTa such that P± <conf P 2 - Then 

Conf(Pi) C Conf(P2) ; (18) 

NConf(Pi) a NConf(P2) . (19) 

Proof. Follows immediately from definition 8 and theorem 6. □ 

The language NConf(P) defines a most general behaviour that can be ac- 
cepted by any test that is to be nonconflicting with a given process P. Therefore, 
it can be used to construct a most general test that is nonconflicting with P. 

Definition 9. Let P G Pa be a process. Define the process Np to be the 
smallest deterministic process such that 

C{Np) = NConf(P)u; ; (20) 

M{Np) = NConf(P)u; . (21) 

By this construction, Np is the deterministic process which accepts all traces 
in NConf(P) and can terminate successfully after each of them. This is the 
largest possible behaviour that can be nonconflicting with P. The following 
result shows that this process is nonconflicting with P; obviously any process 
that accepts a trace not in NConf(P) accepts a trace in Conf(P), and therefore 
is conflicting with P. 

Proposition 8. Let P G Pa be a process. Then P and Np are nonconflicting. 

Proof. Let s G A* such that P || Np P' || Np. Then s G £{Np), and by 
definition of Np it follows that s G NConf(P). Thus, there exists a test T G Pa 
such that T ^ T' and T is nonconflicting with P. Then P || T P' || T' , 
and there exists t G A*w such that P' || T' 4>. This means that st G A4{P) 
and st G C{T). The latter implies st G NConf(P)w = M{Np). Since Np is 
deterministic and Np Np, this implies Np =J>. Hence P' || Np =>, i.e., P 
and Np are nonconflicting. □ 
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Fig. 3. Examples of blocking processes. 



3.6 Relationship to Other Process Semantics 

The linear time-branching time spectrum [23] contains a wide range of existing 
process equivalences. This section explores the relationship of conflict equiva- 
lence with known equivalences and shows that it is situated somewhere between 
bisimulation and failures equivalence. 

One of the finest process equivalences is bisimulation equivalence [18], which 
keeps track of the complete branching structure of process behaviours. This is 
more than enough to distinguish possible conflicts with other processes. It has 
been established [2] that bisimulation equivalent processes are also fair testing 
equivalent, and this result immediately extends to conflict equivalence. Thus, 
any algorithms relying on bisimulations can also be used for conflict analysis. 

The other end of the spectrum contains the trace and failures preorders and 
equivalences as the coarsest relationships between processes. The trace preorder 
simply compares the languages of two processes, while the failures preorder com- 
pares their failures semantics [10,23]. 

Definition 10. The failures semantics of P G Ua is 

F(P) = {(s,F) G A* X P(A,j) I P P' and there does not (22) 
exist any ip G F such that P' -^ } . 

Failures semantics associates with each trace s its immediate failures, i.e., 
sets of actions which the process may fail to accept after having executed s. 
This not only differs from nonconflicting completion semantics in that failures 
are considered instead of successful completions; failures semantics furthermore 
considers single actions instead of traces. 

It is known [2] that failures equivalence differs from fair testing equivalence, 
and therefore also from conflict equivalence. For example, processes P2 and P2 in 
fig. 1 are failures equivalent but not conflict equivalent. Furthermore, the failures 
preorder is strictly coarser than the fair testing preorder, and the two preorders 
coincide for finite processes [19]. This does not hold for conflict equivalence, 
when blocking processes are taken into account. 

Example 5. Processes P4 and P4 in fig. 3 are conflict equivalent, since exactly 
those tests that can initially execute (3 are conflicting with either process. But 
their failures semantics are incomparable, because the two processes fail on dif- 
ferent actions after executing /?. In fact, even the languages of the two processes 
are incomparable. 
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Apparently, the failures of conflict equivalent processes may differ completely 
in parts of the behaviour which are blocking. This can be fixed by taking the set 
of certain conflicts into account. 

Definition 11. The Mocking-failures semantics of P G TTa is 

Fbi(P) = F(P II Np) U CONF(P) X P(A^) . (23) 

This modification of failures semantics considers all actions that lead into the 
set of certain conflicts as failures. In addition, it does not make any distinction 
for traces that are in the set of certain conflicts. This induces another preorder, 
called the Mocking-failures preorder, which can be shown to be coarser than the 
conflict preorder. 

Definition 12. Let P\,P2 G Ha- Write P\ <F,bi P2 if Fbi(Pi) C Fbi(P2). 

Proposition 9. Let Pi, P2 G Ha- If Pi ^conf P2 then Pi <F,bi P2- 

Proof (sketch). Let (s, P) G Fbi(Pi). If s G Conf(P 2), it follows immediately 
that (s,P) G Fbi(P2). 

Therefore assume s G NConf(P 2). Then s G NConf(Pi) by proposition 7. 
Thus (s,P) G Fbi(Pi) implies (s,P) G F(Pi || iVpJ. Therefore let Pi || Np^ 

P{ II Np^ such that there does not exist any ip € F with P[ || Np^ Now let 
F = { s(fit G A*‘^ I (p G A^ \ P}, and construct a deterministic test T G Pa 
such that 

£(T) = NConf(P2)w\P and M{T) = NConf(P2)w \ P . ( 24) 

T can be constructed from Np^ by removing all extensions of s via an action 
not in P from its behaviour. Therefore Np,^ || P = T and by proposition 7 also 
Api II P = T. Furthermore note that s G C{T). 

It can be proven that Pi || Np^ and P are conflicting. Since Pi <conf P2, 
it follows that P2 || Np^ and P are conflicting. Thus, P2 and Ap^ || P = P are 
conflicting. Because of the close relationship between P and Ap^ this conflict 
must be caused by s, i.e., 

P2 II p ^ p^ II r , (25) 

and there does not exist any t G A*o; such that P2 || T' =>. Now assume that 
(s,P) i Fbi(P2), i.e., (s,P) cf F(P2 || ApJ. Since s G NConf(P 2) and P2 P^, 
there exists p G F such that 

P2 II Np, II P ^ P^ II N'p^ II P' 4 P" II N”, II P" . (26) 

But since P2 and Np, are nonconflicting, P!f and A^ || T” = T” can terminate 
together, contradicting the above result that there does not exist any t G A*u; 
such that P2WT' □ 

This result shows that the conflict preorder implies the blocking-failures pre- 
order, i.e., it generalises the known relationships [19] between the fair testing 
and failures preorder to the case of blocking processes. A similar relationship 
with a modified form of trace preorder can also be established. 
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4 Conclusions 

The notion of conflicts [8] from discrete event systems theory has been intro- 
duced, and its relationship to process-algebraic testing semantics has been ex- 
plored. It has been shown that conflicts induce a testing preorder closely related 
to fair testing [2, 3, 19], and that most of the known properties of fair testing 
apply to this conflict preorder. Through the introduction of the set of certain 
conflicts, known results about fair testing have been extended to the case of 
nonblocking processes. Furthermore, the conflict preorder has been shown to be 
the best possible preorder for modular reasoning about conflicts. 

The problem of checking whether a large system of concurrent processes is 
nonconflicting remains a challenging research problem, because it is very hard 
to analyse this property in in a modular way [24] . An alternative to brute- force 
model checking [17], could be to simplify subsystems using abstractions [7,15] 
that preserve possible conflicts with the rest of the system. 

The results of this paper provide the mathematical framework needed to com- 
pute conflict-preserving abstractions of state transition systems. In the future, 
the authors plan to apply these results to design algorithms that can perform 
such abstractions automatically, and use them to analyse large discrete event 
systems in a modular way. The goal is to develop efficient techniques for verify- 
ing complex systems to be conflicting or nonconflicting. 
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Abstract. Testing equivalence is a quite powerful way of expressing 
security properties of cryptographic protocols, but its formal verifica- 
tion is a difficult task, because it is based on the universal quantifica- 
tion over contexts. A technique based on state exploration to address 
this verification problem has been previously presented; it relies on an 
environment-sensitive labelled transition system (ES-LTS) and on sym- 
bolic term representation. This paper shows that such a technique can 
be enhanced by exploiting symmetries found in the ES-LTS structure. 
Experimental results show that the proposed enhancement can substan- 
tially reduce the size of the ES-LTS and that the technique as a whole 
compares favorably with respect to related work. 



1 Introduction 

Due to the increasing importance of secure distributed applications the formal 
verification of cryptographic protocols is being extensively studied by several re- 
searchers, through the investigation of proof techniques, based on various proof 
systems and description formalisms [1,4,20,21], or state exploration methods [9, 
12,14,15,16,17]. The latter requires modelling the protocol behaviour as a rea- 
sonably sized finite state system, which generally entails the introduction of 
simplifying assumptions and numerical bounds that can reduce the accuracy of 
the analysis. Nevertheless, this kind of verification has the invaluable advantage 
of being fully automatic. 

This paper is focused on the spi calculus [2], a process algebra derived from 
the TT-calculus [18] with some simplifications and the addition of cryptographic 
operations. The testing equivalence formulation of security properties introduced 
in [2] is more accurate than alternative formulations based on the intruder knowl- 
edge [10]; however, the efficient verification of testing equivalence is not trivial 
because of the universal quantification over testers: testing equivalence means 
that two processes are indistinguishable for any tester process, and there are 
infinite such processes. This problem has been addressed initially in [1] and [4], 
where tractable proof methods were introduced. Instead, [12] defined a spi cal- 
culus dialect and its brutus logic [8], aiming to the definition of a theoretical 
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framework for model checking a set of logic properties, possibly wider than those 
expressed by testing equivalence, on the spi calculus. 

More recently, a method for checking the spi calculus testing equivalence 
using exhaustive state exploration instead of theorem proving has been presented 
in [10]; there, the problem of the quantification over contexts is solved in a way 
similar to the one reported in [4], i.e. by defining an Environment-Sensitive 
Labelled Transition System (ES-LTS), which describes the possible evolutions 
of the protocol principals and of the corresponding intruder knowledge. In [10] it 
has been shown that trace equivalence defined on such an ES-LTS is a necessary 
and sufficient condition for testing equivalence. 

In order to have a finite model which can be analysed by state exploration, 
only spi calculus descriptions having a finite number of processes are considered, 
thus ruling out the infinite replication operator of the spi calculus, and symbolic 
techniques are used to get a finite representation of the infinite set of data 
that the intruder can send each time a protocol principal performs an input. 
To further reduce the model size and keep it within reasonable bounds, this 
paper introduces a symmetry-based reduction method, which cuts off duplicated 
behaviours that can be identified by inspection of the state behaviour expression. 
When multiple, concurrent sessions of the same protocol are involved, as often 
happens in practice, such symmetries show up extensively. The feasibility of 
the proposed method and the increase of performance achieved by symmetry- 
based reductions are shown by means of a preliminary version of an automatic 
verification tool, called S^A (Symbolic Spi calculus Specifications Analyser). 

This paper assumes the reader to be familiar with basic cryptographic tech- 
niques and the spi calculus; it is organised as follows: Sects. 2 and 3 recall the spi 
calculus language and the ES-LTS model, respectively. Then, Sect. 4 presents the 
theory of our symmetry-based reduction technique. Sect. 5 shows an example, 
and Sect. 6 gives some experimental results and comparisons. Sect. 7 concludes 
the paper and discusses possible further developments. 

2 The Spi Calculus 

The syntax of the spi calculus provides for two basic language elements: terms, 
which represent data (e.g. messages, channel identifiers, keys, key pairs, integers), 
and processes, which represent behaviours. Terms can be either atomic constants 
and variables, or structured terms built using term composition operators. 

Table 1 outlines the syntax of the spi calculus [2] . The left-hand side of the 
table shows the term grammar and presents some naming conventions used in 
this paper. Besides term specification, the spi calculus also provides a rich set 
of process algebraic operators, used to build behaviour expressions; they are 
summarised in the right-hand side of Table 1 . 

In addition, fv{P) and fc{P) denote the set of free variables and constants 
occurring in process P, respectively. Both sets can easily computed by syntac- 
tically inspecting process P; for simplicity, it is assumed that name overloading 
is not allowed. Also, A denotes the set of spi calculus names, and M(A) the set 
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Table 1. Syntax of the spi calculus 



a, p,e : 


:= terms 


P, Q, R 


m 


constant 


W{p).P 


0 


the zero constant 


a{x).P 


x,y 


variables 


P\Q 


p) 


pair 


{v m) P 


suc(a) 


successor 


0 


H{a) 


hashing 


[<T is p] P 


{^}p 


shared-key encryption 


let (x, y) = a in P 


cr+ 


public part 


case a of 0 : P suc{x) 


a~ 


private part 


case a of {x}p in P 


{M}p 


public-key encryption 


case a of {[x]}p in P 


[{<^}]p 


private-key signature 


case a of [{®}]p in P 



processes 

output 

input 

composition 

restriction 

nil 

match 

pair splitting 
Q integer case 
shared-key deer, 
decryption 
signature check 



of all spi calculus terms that can be built starting from A. The usual implicit 
assumptions about perfect encryption apply in the spi calculus as well as in other 
similar specification formalisms. 

3 The Environment-Sensitive LTS Model 

The environment-sensitive labelled transition system (ES-LTS) defined in [10] 
describes all possible interactions of a given spi calculus process with its envi- 
ronment. In such a model, each time the spi calculus process executes an input, 
the environment can send it any data term that can be built starting from the 
current environment’s knowledge. Since the set of such terms is infinite, it is 
represented symbolically by a so-called generic term, so as to have a finite ES- 
LTS. Each state of the ES-LTS is denoted {K \> P)r,A and is made up of the 
current spi calculus process P, the current environment’s knowledge K, and a 
specification of how symbolic terms occurring in P and K must be interpreted 
{T,A). Each ES-LTS transition takes the general form: 

{K\>P)r^A^ {K' t> P’)r',A' , (1) 

where /i and are synthetic representations of the action performed by process 
P and of the complementary action performed by the environment, respectively. 

3.1 Knowledge Representation 

The environment’s knowledge is represented in a minimised and canonical form. 
It includes a set of messages learned by the intruder, with a labelling on its 
elements that uniquely identifies them according to the order in which they have 
been added to the set. The need for such a labelling descends from the features 
of testing equivalence. If we are just interested in checking simple security prop- 
erties related to what data the intruder can generate, no labelling is needed. If 
instead we are interested in verifying testing equivalence, we have to model also 
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the intruder’s ability to classify the data items of its knowledge according to when 
they have been learned. For example, the spi processes P = c{M) ,c{N) ,c{M) .0 
and Q = c{M) ,c{N) ,c{N) .0 are not testing equivalent, although the set of data 
that the intruder can learn from each of them is the same at each step. Indeed, a 
simple test that can distinguish between P and Q is the one that checks whether 
the third received data is equal to the first one. The labelling on the environ- 
ment’s knowledge data items solves this problem and, at the same time, makes 
it possible to identify such data items independently of their particular value, 
which is another key feature needed in checking testing equivalence. For exam- 
ple, the spi processes P = (i'k)c{{M}k) -0 and Q = {i^k)c{{N}k) .0 are testing 
equivalent despite their outputs are different; this can readily be recognised by 
comparing the identifiers that the intruder assigns to the learned data, instead 
of comparing the data themselves. 

The environment’s knowledge is formally represented by a bijective function 
K : S ^ L, where the domain S is the set of terms that the intruder has learned, 
and the image L is the finite set of indexes uniquely identifying them. We denote 
/ the countable, ordered set of indexes {L C I). Of course, the intruder term 
generation capabilities depend on S. If a £ A4(A) is a finite term, we say that 
a can be produced by K, written iF h ct, iff ct belongs to the closure of S with 
respect to the spi calculus term composition operators, i.e. a £ S. Similarly, we 
say that cr can be produced by E, written T" h <t, iff ct G T". As shown in [10], 
E is always kept in a minimised and canonical form. The decidability of AT F cr 
and of E a has been proved in [10] and in [6], where the algorithms to check 
them and to incrementally compute E are given. In the following, K' = /(p, K) 
denotes the new knowledge that is reached from K after the environment has 
observed the data term p. 

3.2 Symbolic Data Representation 

As already noted, an input action of the current spi calculus process is repre- 
sented in the ES-LTS by means of a generic data term, which represents the 
infinite set of terms that the environment can generate at that moment. Generic 
terms are added to the spi calculus by extending the language with an additional 
infinite and countable set of names T, such that T fl A = 0 and P D I = 0. In 
the rest of this paper, 7 ranges over P. A generic term is a spi calculus term 
6 £ A4{A LI I LI P) which has at least a subterm 7 G T. 

Each symbolic ES-LTS state is characterised by a (finite) current set of atomic 
generic terms G C P. A function T : G ^ which is part of the 

symbolic ES-LTS state, gives the current interpretation of atomic generic terms. 
Each 7 G G is mapped onto a corresponding knowledge function domain T(j) 
that represents the set of terms that was available to the intruder when 7 was 
generated. 

When the term represented by 7 is generated by the intruder it can take 
the whole set of values T(j), and a different subsequent behaviour is possible for 
each one of them. However, to the purpose of testing equivalence such behaviours 
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are indistinguishable from one another, and can be represented as a single sym- 
bolic behaviour, until the occurrence of some operation is conditioned by the 
value that was initially exchanged. Whenever this happens, only the behaviours 
corresponding to values that satisfy the condition are allowed to proceed, and 
this is symbolically described by narrowing the set of terms represented by each 
atomic generic term 7 down to the largest subset of T(7) compatible with the 
operations performed. In most cases, a narrowing of this kind is equivalent to the 
substitution of one or more atomic generic terms with corresponding specialised 
generic terms or concrete terms. A specialised generic term is a compound term 
having at least one atomic generic term as subterm. For example, a pair splitting 
operation on a generic term 7 narrows the set of concrete terms represented by 
7 so as to include only pairs of elements of T{'y), which is equivalent to applying 
the substitution ((7^7")/7), where 7' and 7" are two new atomic generic terms 
representing the two components of the pair. Substitutions like the ones pre- 
sented above are called specialisations, because they substitute atomic generic 
terms with corresponding specialised or concrete ones. For this reason, atomic 
generic terms are also called unspecialised generic terms. In the rest of this paper, 
^ ranges over specialisations. 

Specialisations are not powerful enough to precisely describe any kind of 
narrowing that can occur on the sets of terms represented by generic terms. 
For this reason, extended narrowing specifications are introduced, which allow 
to specify that a given specialisation must occur, while one or more further 
specialisations must not occur. Such narrowing specialisations take the form 
of a pair (^, Sa) where ^ is the specialisation that must be applied and Sa = 
{Cl) ■ • ■ Cfe} is the set of forbidden further specialisations. Of course, (C, 0) = C- 

As long as the computation proceeds, forbidden specialisations are added up, 
and the set of specialisations obtained as the union of all forbidden specialisations 
accumulated up to the current state is denoted A. It complements T in specifying 
the current interpretation of generic terms. 

The set of all the specialisations that can be applied in the current state 
depends on T and A only and is denoted Sr, a- When a specialisation C is applied 
to a symbolic state, T and A are updated accordingly, the new interpretation 
being denoted TjC}, A|C}. 

3.3 Canonical Representation 

As already mentioned, when checking testing equivalence it is necessary to ab- 
stract away from the exact value of the exchanged data, the only relevant thing 
being how such data are related to the current intruder knowledge. The canon- 
ical representation of a term a with respect to an intruder knowledge K ex- 
presses how a is related to K. This concept can be introduced by extending 
the notion of substitution. The substitutions originally introduced in [2] act on 
atomic terms only. If this constraint is relaxed, we can have substitution lists 
A = (J\l p\. Oil Pi, . . . , On! Pn-, where some pi are non-atomic terms. If A is one of 
such extended substitutions, the postfix operator [A] replaces each occurrence of 
term pi with term Oi. 
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The canonical representation of a term a with respect to an intruder knowl- 
edge X is a spi calculus term defined over the extended set of names ^ U T U I, 
obtained by applying to cr a substitution that replaces each subterm p G Dom{K) 
by its corresponding unique identifier K{p). Such a substitution is represented 
by a substitution list composed of an item K{p)/p for each p G Dom{K). With 
abuse of notation, this substitution is denoted K, and, consequently, the canon- 
ical representation is written <j[K]. 

Since [K] substitutes each occurrence of p with its corresponding index K(p), 
<j[K] actually specifies how a can be built using the data items available in the 
intruder knowledge, each one identified by its index. If iG h cr, then a[K] G 
M{r L) I), i.e. the canonical representation of a term that can be produced by 
K does not contain any spi calculus name, but only generic terms and indexes, 
because it can be built using the elements of the intruder knowledge only. In a 
similar way, it is possible to define the canonical representation of any object 
containing spi calculus terms. 

3.4 The ES-LTS Derivation System 

Transitions can be categorised into three different types, taking the following 
forms: 



{K > P)r,A 


T 

i[K'] 


{K' o P')t',A' 


(2) 


{K > P)r,A 


1 >■ 


{K' o P')t',A' 


(3) 




ii,SA)[K>],SK 






{K > P)r,A 


1 a[K]^ 


{K > P')t'.a 


(4) 



All labels (including the component denoted as 5k) are canonical representations 
with respect to the new intruder knowledge K' . 

Transitions taking form (2) are related to synchronisation events occurring 
inside the spi process. In this case, the process action is the special symbol t, 
which represents an internal synchronisation. The complementary action label 
may contain a pure specialisation, in which case the transition is referred to as 
a specialisation transition. 

A transition taking form (3) is referred to as an output transition and rep- 
resents an output on channel a. The complementary action label includes an 
item, denoted 5k, which describes how the intruder knowledge is affected by the 
event, and may contain an extended narrowing specification (^, i5yi). The process 
action specifies the channel name a after the application of specialisation The 
overline symbol indicates, as in spi calculus, that the operation performed by 
the process on the channel is an output. 

A transition taking form (4) is referred to as an input transition and repre- 
sents an input from channel a. It implies a data transfer from the environment 
to the process. Thus, no modification in the environment knowledge takes place. 
The process action is analogous to the previous one, whereas the complementary 
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action label is a new generic term symbolically representing any data term that 
can be generated by the intruder. 

The ES-LTS derivation system is an extension of the derivation system de- 
fined in [2] for the reaction relation. The main rules specify when input and 
output transitions may take place: 

(i,SA)(^0ip,Kr,A) ^ 

<rK][ir'] i ' 

{Kt>a{p).P)r,A i ^ ^ ^ (K' > P[i])rpA' 



K\-a 70dom(T) 
o-[AT] 

(K > <y{x).P)r,A I — (tC > P[l /^\)ru{{^,dom{K))},A 



(6) 



where: 



f{i,SA)iP^^r,A) = /(d[?])-f^K])r{5},(/iu5^){C} (V 

6>(p,Kt,a) = {(5)^4) I ? G Sr,A,^A G ‘S^T, /(^,Sa)(p> -^t,a) G /Ct,a} (8) 

Function /(^^Sa) is the symbolic version of function / after the application 
of the narrowing specified by As can be seen in (7), it results from the 

composition of the knowledge transformation implied by the application of 
the addition of 5a to A, and the knowledge transformation described by function 
/(). 6>(p, Kt,a) is the set of narrowings (^, 5a) that make Kt,a) a valid 

knowledge function. /Cr, a is the set of all valid knowledge functions. 

It is worth noting that it is possible that the new knowledge function K'j., , 

reached after the output of p, depend on how generic terms are specialised. 
Therefore, there may be more than one possible K^i aa each one corresponding 
to a different transition and to a different element of Kr,A)- In rule (5), the 
5k of rule (3) has been expanded as (5]({p), 5^{p), p) [K'\, where: 

— p is the term the process sent in output on the public channel cr, and that 
has been observed by the environment, 

— 5~^{p) is the set of all elements that are eliminated from K in the transfor- 
mation from K to K' when p is received by the environment, 

— i5^(p) is the set of terms that become decipherable after p has been received 
by the environment, without being eliminated from the intruder’s knowledge 
domain. 

The semantics of specialisation transitions is described by the following rules: 

(K > P)r,A< ■ M-fr > 

(9) 

{K > {P\Q))t.a > > (P'|Q))r„iU} 



Cgcr»p 

{K>{a{9).P I pix).Q))r,A^J^^{K t> {W{9).P \ p(x).Q))r,A{A 



(10) 
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CgQ-»p 

{K r> ([a is p]P))r,A < ^ P]-P)r,p{4} 



( 11 ) 



'y'.y" ^dom{T) 

{K > {let (x,y) in P))r,A' >{K\>{let (x,y) ^7 in P))y,a{{ 7' n") /j} 

{'y',y'')/y 



(12) 



{K > {case 7 of 0:P suc{x):Q))y a' — ^ — y{K[>{case 7 of 0:P suc{x):Q))y a{0/i} 

0 /-/ 



(13) 



7 ^ 0 dom(T) 

{K > {case 7 of 0:P suc{x):Q))y,a' y{K\>{case 7 of 0:P suc{x):Q))Y,A{suc{'y')/'y} 

suc(y')/~y 

(14) 

;ggop 

{K [> {case 9 of {x^ p in P))y,a ' — | — \~~\]^ ^ {case 9 of {ai}p in P))T,yi{^} 

Rule (9) specifies how the parallel composition operator is dealt with. The 
other rules specify all the situations in which specialisation transitions can occur. 
The operators • and o are unification operations. Each one of them yields the 
minimal sets of specialisations that, when applied, make some condition true. 
More precisely, ct • p is the set of specialisations that must be applied to cr and 
p to match them. Formally, a • p = G Sr, a \ cr[^] = p[^]}. Similarly, a o p 

yields the specialisations that make a a term encrypted under key p. Rule (15), 
dealing with shared-key decryption, has a counterpart for private and public-key 
decryption, not shown here for conciseness. 

ES-LTS traces are defined in the usual way as sequences of transition labels. 
In [10] it is proved that trace equivalence defined on the ES-LTS coincides with 
testing equivalence. 

4 Symmetry-Based Reductions 

The ES-LTS generation and the consequent trace comparison suffer from the 
state-explosion problem. In order to reduce the number of states, several tech- 
niques [8,22] have been proposed and adopted, mainly in the field of reachability 
analysis; formerly, symmetry-based techniques have been employed, mainly to 
check for the existence of deadlocks, in the Petri net community [13]. Our ap- 
proach is mainly inspired by the pioneering work of [22] but, to our best knowl- 
edge, this is the first time that a symmetry-based reduction technique is applied 
to improve the efficiency of automatic testing equivalence verification. 

In [22], two kinds of symmetry are analysed and exploited: process and state 
symmetry. The latter cuts down the number of outgoing edges from a given 
state of the model state space by performing a partition of the set of processes 
based on their local state, and then constructing and storing only the edges 
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corresponding to a single, representative process (the leader process) for each 
equivalence class. Similarly, our technique reduces the number of transitions 
departing from a given ES-LTS state to be explicitly considered by relying on the 
fact that, under suitable process equivalence conditions, all processes in the same 
equivalence class lead to the same traces in the ES-LTS; so, in the construction 
of the ES-LTS itself it is enough to take only one process per class into account. 
Unlike [8,22], our approach also deals with the symbolic data representation in 
the ES-LTS and is suitable for the more powerful testing equivalence verification. 

Of course, the notion of process equivalence must be made clear, keeping in 
mind that each process also has a context, consisting of the intruder knowledge 
K, the function T and the set of forbidden substitutions A; thus, process equiv- 
alence shall also involve context equivalence. The notion of equivalence among 
processes proposed here is based on the syntactical identity of processes up to a 
substitution of both the constant and free variable names, that is, two processes 
Fi and Fj are equivalent iff there exists a suitable substitution which maps the 
names of constants and free variables of the first process on the constants and 
free variables of the second process. 

This substitution induces a process equivalence only if the context is unaf- 
fected by the substitution: this trivially holds for the free variable names, in fact 
given a finite spi calculus process {vm){Fi | . . . | Fn) — {i/fh) will be used as a 
shorthand notation for (v mi) ■ ■ ■ (i/ mn) — the spi calculus semantics guarantees 
that fv{Fi) C\fv{Fj) = 0 for each i ^ j with i,j€ [1, . . . ,n]. 

For constants, more attention must be paid, because we must make sure 
that a substitution involving constants does not change the semantics of the 
behaviour expression in effect at the starting ES-LTS state. If this condition 
does not hold, there is no process equivalence, because the substitution could be 
indirectly observed by other processes. 

The relation TZ informally described above can now be formally cast. It can 
be proved with ordinary effort that TZ is symmetric, reflexive and transitive: 
so, TZ is indeed an equivalence relation, and can be used to partition a parallel 
composition of spi calculus processes. Since spi calculus processes are finite, the 
partition of their parallel sub-processes can be computed algorithmically. Given 
a spi process {vfh){Fi \ Fj \ F) with context K, A and T, we define TZ as: 



Fi TZ Fj 



3 X, 



' Xij is a bijective substitution 
T^i[Xij\ = Fj 

K = K/ 

< DomiX'lj) = Im{X"j) = fv{F,) Ufv{Fj) (16) 

Dom{X1^) = Im{X1j) = fc{F,) Ufc{Fj) 

K[X,j] = K, A[X,j] = A, r[X,j] = r 
. (j^to)(P* I F,[Xij] I F) = {i^m){F, \ F,[X^j] \ F)[Xij] 



Above, a substitution A has been explicitly partitioned into two disjoint parts: 
A° involving only constant names, and A" involving only free variable names, 
when useful. Relations Dom(AE) = /m(AE) = fv{Fi) U fv{Fj) and Dom{X^j) = 
Im{X1j) = fc{Fi)Ufc{Fj) mean that constant names replace constant names and 
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variables replace variables. The guarantee that the substitution does not affect 
the context is given by: K[Xij] = K, A[Xij] = A, and T[Xij] = T, which 
refers to the sets giving the state interpretation, and {vrh){Pi \ Pi[Xij] \ P) = 
{ufh){Pi I Pi[Xij\ I P)[Xij] which states that the substitution leaves unchanged 
the syntactical form of the behaviour expression of the state. As an example, let 
us consider the following process: 

Pi \ P 2 — c{x) .c{y) .c{z) .[x is z] c{y).0 \ c{t) .c{w) .c{u) .[t is m] c{w).0 , 

(with AT = 0, T = 0 and A = 0). The substitution A = {t/x, w/y, u/z, x/t, y/w, 
z/u) satisfies (16) and Ai[A] = P 2 (and P\ = f 2 [A]). If we consider instead: 

P\ \ P 2 — c{x) .c{y) .c{z) \x is z] c{y).0 \ c{t) .c{w) .c{u) .[w is u] c{w).0 , 

there is no substitution satisfying (16): in fact we would have to map w on j/ in 
the second input event, and on x in the test construct, so we cannot build an 
injective substitution. This occurs because in P2 the comparison is between the 
second and the third input data, whereas Pi compares the first and the third 
input data. Let us now consider a process: 

{i 2 m){Pi I P 2 IP 3 ) = {uM){uN){c{x).c{M).0 \ c{y).c{N).Q \ c(M).O)) , 

with AT = 0, T = 0 and A = 0: here A = (x/y, y/x, M/N, N/M) gives 
Pi = ^ 2 [A], but A 3 [A] ^ A 3 , thus the last requirement on the substitution 
{{vm){Pi I Pi[Xij] I A) = {vm){Pi \ Pi[Xij] \ A)[A^]) is violated. As a last 
example let us consider the process: 

{vrh){Pi I P 2 ) = {uM){vN){c{x) ,c{M) .Q \ c{y).c{N).0) , 

with context AT = {(c, Iq), {M, li)}, T = 0 and A = 0. A substitution 
A = {x/y, y/x, M/N, N/M) gives the equivalence between Ai and P 2 , but the 
environment can distinguish M from N, since M is already in the attacker’s 
knowledge. In fact, the requirement AT [A] = AT is violated in this case. 

The following theorems hold, whose proof can be found in [7]: 

Theorem 1. Given a spi process A with context K, T, A and a bijective sub- 
stitution X on constant and free variable names such that X = A~^, then: 

{K t> P)r.A^{K' > A')t'.4' ^ {Kt> A)r.A[A]^(i^' > A')t'.a'[A] . 

4, 4 



Theorem 2. Given a context K, T, A, a process A = {vrh){Pi \ Pi[Xij\ \ A), 
and a substitution Xij defined as in ( 16 ), then: 

{K > {vm){P, I A,[A,,] I A))r.A ^ {K' > {vm){P( \ A, [A,,] | P'))t',A' 

4 > 

t 

{K>{iym){P, I A,[A,,] | A))t.a ^ (K' t> {izm){P' \ A,[A,,] | P'))r',A'[\j] 

4 > 
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m 

h 


c(a:). [xis 


H(M)] 0 


\c(x).casexof {y}k 


inc{H(y)).Q 


|c({M}fe>. c(a:). 


H(M)] 0 


\c(x).casexof {y}k 


inc(H{y)).Q 







11 ; c {M}k 

Iq 


l2l c {M}t 


;13; 

; io 


14 1 

^0 


c(x).[xisH(M)]0 
|c(i). case X of {i/}^ inc(H(y)). 0 

|c(x). case X of inc{H{y)).0 


«UA^}fc)- c(®)- [*tsii'{M)]0 
|c(x). case X of {yj^ inc(H(y)). 0 
1 c(x). [x is H(M)] 0 
|c(a:). case X of inc{H(y)). 0 


; c({M}fc).c(:r).[®i5if(M)]0 

[case 70 of {y}^ inc(H(y)). 0 

;|c(x). case X of inc{H(y)). 0 


1 c(sc). case a; of {y}^ inc{H(y)}. 0 
|c{{M}fc).c(a;).[a;i5if(M)]0 

[case 70 of {y}^ inc(H(y)). 0 




70 


211 C {M}fe 

iQ h 


22 1 c {M}k 

la h 


23 1 c {M}k 

In h 


;24; c {m}^ 

; h h 


c(x).[xisH(M)]0 
|c(®). case X of {y}k inc{H{y)).0 
|c(x). [xisH(M)]0 
|c(x). case x of inc{H{y)).0 


c{{M}k).c{x).[xisH(M)]0 

[case 70 of {y}k inc(H(y)). 0 

|c(x). case x of inc{H{y)). 0 


c(®)- [*isi?(M)] 0 
|c(®). case X of inc{H(y)). 0 

|[7o isH(M)]0 

|c(®). case x of inc(H(y)). 0 


;|c(®). case X of {y}k inc{H{y)). 0 

;jc(a:).[a;i^if{M}]0 

;|case 70 of {y}k inc(H(y)). 0 



Fig. 1. A quite simple protocol: partial ES-LTS with states and traces 



Because of (16): K[\j\ = K, A[A^] = A, T[X,j] = Y, {urh){P^ \ PiWj] \ P) = 
{um){Pi I Pi[\ij] I P)[\ij], and = A“-\ so we have: 

{K > {yfh){P, I P,[\ij] I P))r,A = {Kt> {vm){P, \ A [A,,] | P))r.a[A,,] , 

and the hypotheses of Theorem 1 hold. Theorem 2 states that, under suitable 
hypotheses, two identical transitions are allowed that originate from the same 
state and lead to two states that differ only by a substitution of constants and free 
variable names, i.e. where hypotheses of Theorem 1 still hold. Thus, Theorem 2 
can be recursively applied to obtain two (sub)-ES-LTSs with the same transition 
labels and corresponding states, barring a substitution. In practice, since one of 
the two (sub)-ES-LTSs has exactly the same labels as the other, the redundant 
(sub)-ES-LTS can be omitted, both in generating and comparing traces. 

5 An Example 

Fig. 1 shows how our reduction technique works on two parallel sessions of a 
simple protocol inspired by [11], where two agents A and B share a secret key 
k and exchange two messages. The initial intruder’s knowledge is assumed to 
he K = {(c, ^o)}- In Fig. 1 each ES-LTS state is represented by a box showing 
the intruder knowledge K (upper part) and the spi calculus specification of the 
process P (lower part). Thus, the spi calculus specification of the protocol as 
a whole can be found in the lower part of the box that corresponds to state 0. 
Set A is always empty in this case, so it is not shown, and T is not explicitly 
represented since it can be easily deduced from the state where each generic term 
was generated. Each arc connecting a pair of states has been labelled with /r and 
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(j), on the left and right side respectively. In this example ^ and Sa are always 
empty in the complementary action (j) of output transitions, and Sk is the same 
as the output term p, so </> is simply represented as p[K'\. Moreover, internal 
events on channels known to the intruder are not represented, since they only 
generate pure r labels and do not contribute to traces. Dashed states and their 
corresponding thin input lines are those discovered redundant by our technique. 
Dots under thick states mean that the evolution of the ES-LTS produces other 
states not shown in the picture. 

State 0 can be partitioned into a single equivalence class, containing two in- 
stances of c({M}fc). c{x).[xis H{M)]0 I c{x).casexof {y}kinc{H{y)).O.T\\e 
substitution simply maps each x and y of the first instance onto the correspond- 
ing ones of the second. It must be noted that the free variables x and y of one 
instance are different from their corresponding counterparts, because of their 
different scope. 

The output event (action) performed by the first instance leads to the state 
marked 11 in Fig. 1, whereas state 12 is reached by considering the corresponding 
event of the second instance. The labels of the two transitions are the same 
(output of message li on channel Iq) and the two states differ only for the role 
of their free variables, i.e. they are the same state except for a substitution of 
the variable names. Thus the sub-traces generated starting with state 11 are the 
same as those starting from state 12. The same reasoning applies to states 13 
and 14, obtained by considering the input event (action) of the two instances. 
Here a generic term 70 is created by the input action on channel Re- 
starting with state 12, we have three equivalence classes: two are made up 
by c{{M} k) ■ c{x) . [xisH{M)] 0 and c{x). [xis H{M)] 0 respectively, whereas the 
third one contains two instances of c{x). casex of {y}kinc{H{y)) .Q. These two 
instances lead to states 22 and 24, where a similar reasoning can be applied. 

6 Experimental Results and Related Work 

A preliminary version of a tool, called S^A, that fully implements the technique 
described in [10] and the enhancement discussed in this paper, has been used 
to verify several cryptographic protocols and, in particular, to test the efficiency 
of our proposed symmetry-based reduction technique. Without symmetry-based 
reduction, the ES-LTS of Fig. 1 has 2,206,186 states, while their number drops 
to 215,268 when symmetry-based reductions are used (less than 10%). 

Moreover, we compared our approach with other tools implementing similar 
reduction techniques for cryptographic protocols analysis. To our best knowl- 
edge, only the brutus model checker [9] implements symmetry-based reduction 
techniques to speed up the verification of security properties of cryptographic 
protocols. Moreover, brutus also uses partial order reductions, to further cut 
down the number of states, but since we are interested in symmetries, in this 
paper we limit our analysis to this topic. 

In [9] numerical results are given about the analysis of three popular pro- 
tocols, namely IKP [3], Needham-Schroeder with public key (N-S) [19] and the 
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Table 2. Experimental results 



Protocol Init Resp 


BRUTUS 


S^A 


None 


Symm 


None 


Symm 


N-S 


1 


1 


1,208 


1,208 


81 


81 


N-S 


1 


2 


1,227,415 613,713 


36,233 


17,365 


N-S 


2 


2 


X 


X 


9,007,163 


2,176,344 


N-S 


2 


3 


X 


X 


X 390,126,070 


WMF 


3 


3 


X 


X 


X 


40,959,126 



Wide Mouthed Frog protocol (WMF) [5]. For our purposes, the analysis of the 
last two protocols is more interesting, since it has been carried out with an in- 
creasing number of instances for each role, giving ideas of the exponential growth 
of the number of states when the number of instances of each role increases. 

The first column of Table 2 shows the protocol name acronym and the number 
of instances of the initiator and responder roles respectively. The second column 
shows the number of states generated by brutus when no reductions are applied 
(None sub-column) and when symmetry-based reductions are used {Symm sub- 
column). The rightmost column gives results obtained with S^A. An ’A’ symbol 
is used when the number of states is too big to be computed with reasonable 
resources (> 700,000,000 states). We based the comparison on the number of 
states instead of execution time because, to the extent of our knowledge, no 
execution time information is publicly available for brutus. 

Although BRUTUS implements also partial order reductions whose results 
are not depicted here, it must be pointed out that, when the comparison is 
carried out using the same reduction technique, S^A behaves undoubtedly better. 
The second and third row of Table 2 show that the compression ratio on the 
total number of states achieved by symmetry-based reductions is about 1 : 2 
when both tools work on the same problem “N-S 1 2” (613,713/1,227,415 and 
17,365/36,233), and the performance of S^A becomes better, reaching 1 : 4.14, 
when the problem size increases, as in “N-S 2 2” (2,176,344/9,007,163). This 
result demonstrates that symmetry-based reductions perform better when the 
number of instances of roles grows, and more symmetries can be exploited. For 
the same reason, it is easy to understand why they do not yield any advantage 
for “N-S 1 1”. It can also be noted that S^A performs better than brutus in 
absolute terms, both with and without the help of symmetry-based reductions: in 
fact the ratio between the number of states generated by the tools falls between 
1 : 15 (81 : 1,208) and 1 : 35 (17,365 : 613,713). 

The difference is mainly due to the fact that brutus is a concrete model 
checker, i.e. when the process performs an input action, all the messages the 
intruder can build starting from its knowledge have to be explicitly considered. 
This number is infinite even if the knowledge is finite, thus the exhaustive mes- 
sages generation is restricted by means of an artificial upper limit on the size 
of the messages the intruder is enabled to generate. Although this limit makes 
the problem tractable, it potentially implies a restriction on the attacks that the 
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technique can detect, and the input event management still remains a critical 
point for the explosion of the number of states. On the other hand, S^A adopts 
a symbolic representation of input messages, with a twofold advantage: input 
events are not a potential point of state explosion, and having no limitations on 
message lengths and so on, our state generation is exhaustive. Moreover, S^A 
also deals with non-atomic keys and checks safety properties by means of testing 
equivalence verification, which allows to formulate secrecy properties in a more 
accurate way than systems like brutus. 

7 Conclusions 

The results presented in this paper extend those achieved previously in the 
field of automatic verification of security protocols, because they give a viable 
alternative to the use of theorem proving for the verification of complex security 
properties based on testing equivalence. 

With respect to [4] , the step-by-step knowledge equivalence verification is no 
longer needed, because the ES-LTS transiton labels introduced in [10] incorpo- 
rate all the information needed to verify testing equivalence. Moreover, [4] deals 
with a spi calculus dialect where public-key encryption, hashing, integers, and 
non-atomic keys are not considered. In addition, symmetries arising from multi- 
ple parallel sessions are exploited by a reduction technique which limits the size 
of the model to be checked. 

The advantages of this technique, that are difficult to be theoretically quan- 
tified, have been verified with the S^A tool: its underlying theoretical frame- 
work [10] is more sophisticated and complex than many others, since it deals 
with testing equivalence, thus allowing a finer grain analysis of secrecy proper- 
ties than reachability anaysis. Moreover, it does not suffer from drawbacks such 
as handling only atomic keys and/or limitations on the size of messages and so 
on. Despite of this greater generality, S^A has shown encouraging results, even 
better than those coming from other tools based on a simpler and more limited 
theoretical approach. 

Further improvements of the technique presented here can be achieved by 
defining other testing equivalence preserving reductions such as, for example, 
reductions based on partial order, or by extending the technique to deal with 
sub-expressions that are equal up to a substitution of generic terms as well. 
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Abstract. Data races do not cover all kinds of concurrency errors. This 
paper presents a data-flow-based technique to find stale-value errors, 
which are not found by low-level and high-level data race algorithms. 
Stale values denote copies of shared data where the copy is no longer 
synchronized. The algorithm to detect such values works as a consis- 
tency check that does not require any assumptions or annotations of the 
program. It has been implemented as a static analysis in JNuke. The 
analysis is sound and requires only a single execution trace if imple- 
mented as a run-time checking algorithm. Being based on an analysis of 
Java bytecode, it encompasses the full program semantics, including ar- 
bitrarily complex expressions. Related techniques are more complex and 
more prone to over-reporting. 



1 Introduction 

Multi-threaded, or concurrent, programming has become increasingly popular, 
despite its complexity [2]. The Java programming language [1] explicitly supports 
this paradigm while leaving the programmer a lot of freedom for utilizing it 
[16]. Multi-threaded programming, however, provides a potential for introducing 
intermittent concurrency errors that are hard to find using traditional testing. 
The main source of this problem is that a multi-threaded program may execute 
differently from one run to another due to the apparent randomness in the way 
threads are scheduled. Since testing typically cannot explore all schedules, some 
bad schedules may never be discovered. 

One kind of error that often occurs in multi-threaded programs is a data race, 
as defined below. Traditionally this term has denoted unprotected field accesses, 
which will be referred to as low-level data races. However, the absence of low- 
level data races still allows for other concurrency problems, such as high-level 
data races [3] and atomicity violations [11,22]. 

Both high-level data races and previously presented atomicity checks suffer 
from the fact that they show violations of common conventions, which do not 
necessarily imply the presence of a fault. High-level data races further suffer 
from the fact that data flow between protected regions (synchronized blocks 
in Java) is ignored. Our approach complements low-level and high-level data 
races by finding additional errors, while still being more precise than previous 
atomicity-based approaches. 

F. Wang (Ed.): ATVA 2004, LNCS 3299, pp. 150-164, 2004. 

© Springer- Verlag Berlin Heidelberg 2004 
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The algorithm was designed to avoid the need of a specification or an ex- 
haustive search of the program state space [10]. It detects stale-value errors as 
defined by Burrows and Leino [8] and augments existing approaches concerning 
low-level and high-level data races and can be employed in conjunction with 
these analyses. The algorithm is fully automated, requiring no user guidance 
beyond normal input. The actual fault analyzed does not have to occur in the 
observed execution trace, which is why the algorithm is more powerful than tra- 
ditional testing techniques. The algorithm is very modular and thus suitable for 
static analysis. Compared to other atomicity-based approaches, it is simpler yet 
more precise because it captures data flow and thus models the semantics of the 
analyzed programs more precisely. It is related to [8] but models the full seman- 
tics of Java bytecode, including arithmetic expressions. The checking algorithm 
is implemented as a dedicated algorithm in JNuke [4] . Preliminary experiments 
show that it is about two orders of magnitude faster than Burrows’ prototype. 



1.1 Low-Level Data Races 

The traditional definition of a (low-level) data race is as follows [19]: 

A data race can occur when two concurrent threads access a shared 
variable and when at least one access is a write, and the threads use no 
explicit mechanism to prevent the accesses from being simultaneous. 

Without synchronization, it is theoretically possible that the effect of a write 
operation will never be observed by other threads [14]. Therefore it is univer- 
sally agreed that low-level data races must be avoided. Several algorithms and 
tools have been developed for detecting low-level data races, such as the Eraser 
algorithm [19], which has been implemented in the Visual Threads tool [15]. 

The standard way to avoid low-level data races on a variable is to protect it 
with a lock: all accessing threads must acquire this lock before accessing the vari- 
able, and release it again after. In Java, methods can be defined as synchronized 
which causes a call to such a method to lock the current object instance. Return 
from the method will release the lock. Java also provides an explicit form of spec- 
ifying the scope and lock of synchronization using synchronized{Zocfc}{stmt}, 
for taking a lock on object lock, and executing statement stmt protected under 
that lock. If the above unprotected methods are declared synchronized, the 
low-level data race cannot occur. 



1.2 High-Level Data Races 

A program may contain a potential for concurrency errors, even when it is free 
of low-level data races. The notion of high-level data races refers to sequences 
in a program where each access to shared data is protected by a lock, but the 
program still behaves incorrectly because operations that should be carried out 
atomically can be interleaved with conflicting operations [3]. 
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public void swapO ■[ 
int oldX; 

synchronized (lock) •[ 
oldX = coord. x; 
coord. X = coord. y; // swap X 
coord. y = oldX; // swap Y 

} 



public void reset () {. 
synchronized (lock) { 
coord. X = 0; 

y / / inconsistent state (0, y) 
synchronized (lock) { 
coord. y = 0; 

}■ 



Fig. 1. A high-level data race resulting from three atomic operations. 



Figure 1 shows such a scenario. While the swap operation on the two coordi- 
nates X and y is atomic, the reset operation is not. Because the lock is released 
after setting x to 0, other threads may observe state (0, y), an intermediate 
state, which is inconsistent. If swap is invoked by another thread before reset 
finishes, this results in final state {y, 0). This is inconsistent with the semantics 
of swap and reset. The view consistency algorithm finds such errors [3]. 



1.3 Atomic Sequences of Operations 

The absence of low-level and high-level data races still allows for other concur- 
rency errors. Figure 2 shows such an error: The increment operation is split into 
a read access, the actual increment, and a write access. Consider two threads, 
where one thread has just obtained and incremented the shared field. Before 
the updated value is written back to the shared field, another thread may call 
method inc and read the old value. After that, both threads will write back 
their result, resulting in a total increment of only one rather than two. 

The problem is that the entire method inc is not atomic, so its outcome may 
be unexpected. Approaches based on reduction [11,22] detect such atomicity vio- 
lations. The algorithm presented here uses a different approach but also detects 
the error in the example. Moreover, it is conceptually simpler than previous 
atomicity-based approaches and at the same time more precise. 

public void inc() { 
int tmp ; 

synchronized (lock) { 
tmp = shared. field; 

} // lock release 
tmp++ ; 

synchronized (lock) { 
shared. field = tmp; 

} 

> 



Fig. 2. A non-atomic increment operation. 
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1.4 Outline 

Section 2 gives the intuition behind our algorithm. Section 3 formalizes the 
property to be checked, and Section 4 extends the algorithm to nested locks and 
recursion. The precision of this new algorithm is discussed in Section 5. Section 
6 discusses related work. Section 7 shows initial experiments, and Section 8 
concludes. 



2 Our Data-Flow-Based Algorithm 



The intuition behind this algorithm is as follows: Actions using shared data, 
which are protected by a lock, must always operate on current values. Shared 
data is stored on the heap in shared fields, which are globally accessible. Correct 
synchronization ensures that each access to such shared fields is exclusive. Hence 
shared fields protected by locks always have current values. 

These values are accessed by different threads and may be copied when per- 
forming operations such as an addition. Storing shared values in local variables 
is common practice for complex expressions. However, these local variables re- 
tain their original value even when a critical (synchronized) region is exited; 
they are not updated when the global shared field changes. If this happens, the 
local variable will contain a stale value [8] which is inconsistent with the global 
program state. 
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public void inc ( ) { 
int tmp ; 

synchronized (lock) { 
tmp = shared. field; 

tmp++; 

(synchronized (lock) { 
shared. field = tmp; 



} 



1 



shared data is 
used locally 



local data is 
used in another 
shared operation 



Fig. 3. Intuition behind our algorithm. 



Figure 3 shows how the error from the previous example is discovered by 
our new algorithm. A shared field is assigned to a local variable tmp, which 
is again used later, outside the synchronized block. The value of the shared 
field thus “escapes” the synchronized block, as indicated by the first arrow. 
While the lock is not held, other threads may update the shared field. As soon 
as the original thread continues execution (in computations, method calls, or 
assignments), effects of its actions may depend on a stale value. The second 
arrow indicates the data flow of the stale value. 
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Note that we use an uncommon notion of escape analysis. Usually escape 
analysis is concerned with references escaping from a certain scope or region [5, 
7,9,23]. In our algorithm, escaping values are considered, not just references, and 
the scope of interest are synchronized blocks. 

The lack of a single synchronization scope for the entire sequence of oper- 
ations is responsible for having stale values. Hence, if the entire method had 
been synchronized, it would have consisted of a single block, which would have 
executed atomically. Our algorithm uses existing synchronized blocks to verify 
whether shared data escapes them. It therefore requires synchronization to be 
present for accesses to shared fields. The assumption that each field access itself 
is properly guarded against concurrent access can be verified using Eraser [19]. 

Like Eraser and the high-level data race algorithm [3] , our new algorithm only 
requires one execution trace if implemented as a run-time verification algorithm. 
Furthermore, the property is entirely thread-local. A static implementation of 
the algorithm is therefore straightforward. If aliases of locks are known, method- 
local static analysis can verify the desired property for each method while re- 
quiring only summary information about other methods. Static analysis has the 
advantage of being able to symbolically examine the entire program space. 

A dynamic analysis on the other hand has precise information about aliases of 
locks. However, a particular execution typically cannot cover the entire behavior 
of a program. Even though the probability of actually observing erroneous states 
in a multi-threaded program is small, dynamic analysis algorithms are often 
capable of detecting a potential error even if the actual error does not occur [3, 
19]. The reason is that the property which is checked against (such as locking 
discipline) is stronger than the desired property (e.g. the absence of a data race). 
The algorithm presented here also falls into that category. 

3 Formalization of Our Algorithm 

This section gives a precise formalization of our algorithm. The algorithm is 
explained without going into details about nested locks and method calls. These 
two issues are covered in the next section. 

In Java, each method invocation frame contains an array of variables known 
as its local variables and a fixed-size stack holding its stack variables. These two 
kinds of variables are always thread-local [14]. Both kinds of variables will be 
referred to as registers r. A shared field / will denote a field of a dynamic object 
instance which is accessed in a shared context, using lock protection. 

A monitor block encompasses a range of instructions: Its beginning is the lock 
acquisition (monitorenter) of a new lock. Its end is marked by the corresponding 
lock release (monitorexit). It is assumed that lock acquisitions and releases are 
nested as required by the Java semantics [1]. Each monitor block has a unique 
ID b distinguishing individual lock acquisitions. Reentrant lock acquisitions and 
releases have no effect on mutual exclusion and are ignored. 

A register is shared when it contains the value of a shared field / and unshared 
otherwise. When shared, the monitor block in which the shared value originated 
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is also recorded. The state s(r) = {sh, b) of a register comprises its shared status 
sh G {0, 1} and its monitor block ID b. The current monitor block bcurr is the 
block corresponding to the latest non-reentrant lock acquisition. 

At the beginning of execution, all registers are unshared. There are two pos- 
sibilities to obtain a shared value: First, a getf ield instruction within a monitor 
block will produce a shared value. Second, a method invocation may return a 
shared value. Shared values have state (1, bcurr)- We will use two auxiliary func- 
tions returning the first and second part of a state s, respectively: shared(s) and 
monitorblock ( s ) . 

Each assignment of a value will carry over the state of the assigned value. 
Operations on several values will result in a shared value if any of the operands 
was shared. A register r is used by an instruction i, r € used(z), if it is read by it. 
If r is a stack element, the corresponding stack argument is consumed when it is 
read, according to the Java bytecode semantics [14]. If r is a local variable, read- 
ing it does not have any further effect. Note that this definition of usage includes 
expressions and arithmetic operations. In expression tmp2 = tmpl + tmpO, the 
result tmp2 is shared if any of the operands tmpl or tmpO is shared. 

A stale value is a value of a shared register that originated from a different 
monitor block than where it is used. This can be formalized as follows: A program 
uses no stale values iff, for each program state and each register r used by current 
instruction i, the following holds: monitor block of that register, s(r), must be 
equal to the current monitor block: 

Vi, r • (r G used(i) A shared(s(r)) — >• (monitorblock(s(r)) = bcurr) 

If a single operation uses several shared values with different monitor block 
IDs b, then at least one of them must be a stale value. This property is then 
violated, and the result of that operation is again a shared value. ^ We will refer 
to this property as block-local atomicity. If it holds for the entire program, then 
actions based on shared data will always operate on current data. 



4 Extension to Nested Locks and Recursion 

The assumption behind dealing with nested locks is that any locks taken be- 
yond the first one are necessary to ensure mutual exclusion in the nested 
synchronized blocks. This is a natural assumption arising from the program 
semantics: nested locks are commonly used to access shared fields of different 
objects, which use different locks for protection. Let louter and Unner denote an 
outer and an inner lock, respectively. Assume a thread acquires Imner when al- 
ready holding louter- It then accesses a shared field / holding both locks. After 
releasing Imner, the shared field is no longer protected by that nested lock and 

^ In our implementation we marked the result of any such operation as unshared. The 
operation already generates a warning. Resetting the state of that register prevents 
generating more than one warning for any stale value. 
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may thus be updated by other threads. This usage of stale values outside the 
nested lock Imner violates block-local atomicity. 

Low-level data race detection like Eraser misses this error, because each field 
access operation is properly protected. Block-local atomicity detects that the 
shared value becomes stale outside the inner monitor block. The following treat- 
ment of nested locks covers such errors: The algorithm declares a separate mon- 
itor block for each nested lock. If any operation outside the inner block uses a 
shared value such as /, this will be detected by the consistency check defined in 
the previous section. 

Using the shared data from / outside the inner block would only be safe if 
hnner was superfluous: If linner was always used only in conjunction with louter, 
then linner would not contribute to protection against concurrent access. Instead 
the extra lock would constitute an overhead that should be eliminated, and the 
warning issued by our algorithm can help to identify this problem. 

Because a new monitor block is used with each lock acquisition, the total 
number of locks held when acquiring a new lock hnner is not relevant. Thus the 
idea generalizes to a set of outer locks Louter instead of a single outer lock louter- 

When dealing with method calls, only the effect of data flow and 
synchronized blocks has to be considered. In run-time analysis, this is imple- 
mented trivially as method calls do not have to be treated specially.^ In static 
analysis, method calls are essentially inlined, using only summary information 
of called methods. If no new synchronization is used by the called method, the 
method call has no special effect and behaves like a local operation. Otherwise, 
if a new (non-reentrant) lock is used by the callee, the return value will be 
shared with a new unique monitor block ID. Hence the return value of a call 
to a synchronized method is shared, unless the caller itself used the same lock 
during the call, which would make the inner lock merely reentrant. 

Because of this treatment of nested locks, handling inter-method data flow 
is quite natural and very efficient. The analysis does not have to consider call- 
ing contexts other than the lock set held. A context-insensitive variant of the 
algorithm is easily created: One can simply assume that any locks used in called 
methods are distinct. The algorithm will still be sound but may emit more false 
warnings. The same assumption can be used if the effect of a called method is 
unknown, e.g. when a method is native. 

Finally, in a static implementation of the algorithm, the temporary lock re- 
lease in a wait ( ) operation has to be modeled explicitly [8] . For run-time verifi- 
cation in JNuke [4] , the lock release event is implicitly generated by its run-time 
verification API [4]. 



5 Precision and Limitations of Our Algorithm 

If a program is free of data races, our algorithm finds all stale values but may issue 
false warnings. Atomicity-based approaches, including this one, are sometimes 

^ A call to a synchronized method is treated like a block using synchronized(this) . 
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too strict because certain code idioms allow that the globally visible effect of 
a non-atomic operation corresponds to an atomic execution. Serializahility is a 
more precise property, but even non-serializable programs can be correct. 



5.1 Soundness and Completeness 

Our algorithm assumes that no low-level data races are present. This kind of error 
can be detected by algorithms like Eraser [19]. If a program is free of (low-level) 
data races then our static analysis algorithm is sound; no faults are missed. In a 
static approximation of this analysis, however, the alias information about locks 
is not always known. If one assumes each lock acquisition utilizes a different 
lock, the algorithm remains sound but becomes more prone to overreporting. 
Furthermore, soundness is also preserved if it is assumed that any unknown 
method called returns a shared value belonging to a monitor block of its own. 
If the algorithm is implemented dynamically, then soundness depends on the 
quality of a test suite and can usually not be guaranteed. 

False positives may be reported if too many distinct monitor blocks are cre- 
ated by the analysis. A possible reason is the creation of more locks than actually 
necessary to ensure mutual exclusion. However, assuming that synchronization 
primitives are only used when necessary, then the algorithm will not report false 
positives, in the following sense: each reported usage of a shared value in a 
different monitor block actually corresponds to the use of a stale value. 

5.2 Precision Compared to Previous Atomicity-Based Approaches 

Block-local atomicity is more precise than method-level atomicity as used by 
previous approaches [11,13,18,22]. These approaches check for the atomicity of 
operations and assume that each method must execute atomically. This is too 
strict. Non-atomic execution of a certain code block may be a (welcome) opti- 
mization allowing for increased parallelism. Our algorithm detects whether such 
an atomicity violation is benign or results in stale values. Furthermore, it does 
not require assumptions or annotations about the desired scope of atomicity. 

Our algorithm uses data flow to decide which regions must necessarily be 
atomic. At the same time, the analysis determines the size of atomic regions. 
Therefore block-local atomicity reports any errors found by earlier atomicity- 
based approaches but does not report spurious warnings where no data flow 
exists between two separated atomic regions. 

Figure 4 shows an example that illustrates why our algorithm is more precise. 
A program consists of several threads. The one shown in the figure updates a 
shared value once a second. For instance, it could read the value from a sensor 
and average it with the previously written value. It then releases the lock, so 
other threads can access and use this value. A reduction-based algorithm will 
(correctly) conclude that this method is not atomic, because the lock is released 
during each loop iteration. However, as there is no data flow between one loop 
iteration and the next one, the program is safe. Our algorithm analyzes the 
program correctly and does not emit a warning. 
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void sensorDaemonO •[ 
while (true) { 

synchronized (lock) ■[ 

value = shared. field; // acquire latest copy 
value = func (value) ; 

shared. field = value; // write back result 

} 

sleep ( 1000) ; // wait 

} 

} 

Fig. 4. The importance of data flow analysis for synchronized blocks. 



5.3 Limitations of Atomicity-Based Approaches 

The strict semantics of atomic operations and block-local atomicity are not al- 
ways required for a program to be correct. This creates a potential for warnings 
about benign usages of stale values. An example is a logging class using lax syn- 
chronization: It writes a local copy of shared data to its log. For such purposes, 
the most current value may not be needed, so block-local atomicity is too strict. 

Finally, conflicts may be prevented using higher-level synchronization. For 
instance, accesses can be separated through thread start or join operations 
[15]. This is the most typical scenario resulting in false positives. Note that other 
atomicity-based approaches will always report a spurious error in such cases as 
well. The segmentation algorithm can eliminate such false positivies [15]. 



5.4 Serializability 

Even without higher-level synchronization, block-local atomicity is sometimes 
too strong as a criterion for program correctness. Serializability is a weaker but 
still sufficient criterion for concurrent programs [10]. Nevertheless, there are cases 
involving container structures where a program is correct, but neither atomic nor 
serializable. Consider Figure 5, where a program reads from a buffer, performs a 
calculation, and writes the result back. Assume buffer .next () always returns 
a valid value, blocking if necessary. After a value has been returned, its slot is 
freed, so each value is used only once. Method buffer. addO is used to record 
results. The order in which they are recorded does not matter in this example. 

The reason why the program is correct is because the calculation does not 
depend on a stale shared value; “ownership” of the value is transferred to the 
current thread when it is consumed by calling buffer. next (). Thus the value 
becomes thread-confined and is no longer shared. This pattern is not captured 
by our data flow analysis but is well-documented as the “hand-over protocol” 
[16]. It could be addressed with an extension to the approach presented here, 
which checks for thread-local confinement of data. 
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public void workO { 
int value, fdata; 
while (true) { 

synchronized (lock) {. 
value = buffer .nextO ; 

} 



fdata = f (value) ; // long computation 



} 



synchronized (lock) {. / / 
buff er . add(f data) ; // 

} // 
// 



Data flow from previous block! 

However, the program is correct because 
the buffer protocol ensures that the 
returned data remains thread-local . 



} 



Fig. 5. A correct non-atomic, non-serializable program. 



6 Related Work 

Our algorithm builds on previous work on data races. It has been designed to 
detect errors that are not found by data race analysis. The algorithm is related 
to previous work on atomicity violations but is an independent approach to that 
problem. The data flow analysis used in our algorithm is at its core an escape 
analysis, although it uses different entities and scopes for its analysis. 



6.1 Data Races 

Low-level data races denote access conflicts when reading or writing individual 
flelds without sufficient lock protection [19]. For detecting data races, the set 
of locks held when accessing shared fields is checked. High-level data races turn 
this idea upside down and consider the set of fields accessed when holding a 
lock. View consistency serves as a consistency criterion to verify whether these 
accesses are semantically compatible [3]. 

Block-local atomicity is a property which is independent of high-level data 
races. Figure 1 in the introduction showed that certain faults result in high-level 
data races but do not violate block- local atomicity. However, the reverse is also 
possible, as shown in Figure 2, where no high-level data races occur, but stale 
values are present in the program. Hence the two properties are independent [22]. 
Both high-level data races and block-local atomicity build on the fact that the 
program is already free of underlying low-level data races, which can be detected 
by Eraser [19]. The intent behind block-local atomicity is to use it in conjunction 
with low-level and high-level data race analyses, because these notions do not 
capture atomicity violations. 
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6.2 Atomicity of Operations 

Atomicity of operations is not directly concerned with data accessed within in- 
dividual critical (synchronized) regions, but with the question whether these 
regions are sufficiently large to guarantee atomic execution of certain operations. 
Atomicity is a desirable property in concurrent programs [11,13,18,22]. In con- 
junction with the absence of data races, program correctness with respect to 
concurrently accessed data can be guaranteed. 

The key idea is to reduce sequences of operations to serializable (atomic) 
actions based on the semantics of each action with respect to Lipton’s reduc- 
tion theory [17]. In Figure 2, the actions of the entire increment method can- 
not be reduced to a single atomic block because the lock is released within 
the method. The reduction-based atomicity algorithm verifies whether an entire 
shared method is atomic. Recent work includes a run-time checker that does not 
require annotations [11]. A different approach to verify the atomicity of methods 
extends the high-level data race algorithm with an extra scope representing the 
desired atomicity of a method [18]. Block-local atomicity is more precise than 
such previous approaches, as shown in section 5. At the same time it is concep- 
tually simpler, because modeling the data flow of instructions is much simpler 
than deciding whether a sequence of instructions is atomic. 

Atomicity only by itself is not sufficient to avoid data corruption.^ However, 
augmenting data race checks with an atomicity algorithm finds more errors than 
one approach alone. 

6.3 Stale Values 

The kind of error found by our algorithm corresponds to stale values as defined 
by Burrows and Leino [8] but is an independent approach to this question. Our 
algorithm compares IDs of monitor blocks to verify whether a register contains 
stale shared data. Their algorithm uses two flags stale and frorri-critical instead, 
which must by updated whenever a register changes. Unlike their approach, 
which is based on source code annotation, we model the semantics of Java byte- 
code directly. This covers the full semantics of Java, including method calls and 
arithmetic expressions. This allows us to discover potential non-determinism in 
program output, when registers are written to an output. Their approach misses 
such an error as it involves the use of a register in a method call. Furthermore, 
we have a dedicated checker for this property, which is orders of magnitude faster 
than their prototype which uses the ESC/ Java [12] framework that was targeted 
to “more heavy-weight checking” [8]. 

6.4 Serializability 

Atomicity is sometimes too strong as a desired property. Atomic blocks are 
always serializable, but correct programs may be serializable but not atomic 

® Flanagan and Qadeer ignored the Java memory model when claiming that low-level 
data races are subsumed by atomicity [13]. 
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[10]. Serializability, while weaker than atomicity, still suffices to guarantee the 
consistency of thread-local and global program states. Code idioms exist where 
operations are performed on outdated values but still yield the same result as if 
they had been performed on the current value, because of double-checking. 



public void do_trEmsaction() { 
int value, fdata; 
boolean done = false; 
while (Idone) {. 

synchronized (lock) { 
value = shared. field; 

} 

fdata = f (value) ; // long computation 

synchronized (lock) { 

if (value == shared. field) { 
shared. field = fdata; 

// The usage of the locally computed fdata is safe because 
// the shared value is the same as during the computation. 
// Our algorithm and previous atomicity-based approaches 
// report em error (false positive) . 
done = true ; 

> 

} 

} 

} 



Fig. 6. A code idiom that cannot be analyzed with block-local atomicity. 



Figure 6 derived from [10] shows a code idiom suitable for long computations: 
A shared value is read and stored locally. A complex function is then computed 
using the local copy. When the result is to be written back, the writing thread 
checks whether the computation was based on the current value. If this was the 
case, the result is written; otherwise the computation is repeated with a new 
copy of the shared value. Note that even in a successful computation, the shared 
value may have been changed in between and re-set to its original value. Thus 
this operation is non-atomic but still serializable, and therefore correct. 

Atomicity-based approaches, including this one, will report an error in this 
case [11,21,22]. Flanagan’s definition of atomicity only entails visible effects of an 
operation; in this sense, the program is atomic but irreducible [10]. On the other 
hand, the program violates block-local atomicity because its action is not atomic. 
It is merely wrapped in a (potentially endless) loop that creates the impression 
of atomicity. There is currently no approach other than model checking [20] to 
decide whether a program is serializable. This observation does not diminish the 
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value of our algorithm because model checking is still much too expensive to be 
applied to large programs [4,10,20]. 

6.5 Escape Analysis 

Our data flow analysis is related to escape analysis, see [5,7,9,23] and the more 
recent [6], in the sense that it determines whether some entity escapes a region 
of interest. In our case entities are values (primitive as well as references), and 
the regions are synchronization sections. For example, if the content of a held 
a; is 5 and this value is read inside a synchronized section, and then later used 
outside this region, then that value has escaped. In traditional escape analysis 
on the other hand, typically entities are references to heap-allocated objects (not 
primitive values, such as an integer) and regions are methods or threads. In our 
case, the analysis is simpler because modeling the effect of each instruction on 
the stack and local variables is straightforward. 

7 Experiments 

A preliminary version of a static analyzer that checks for block-local atomicity 
has been implemented in JNuke [4]. So far it can only analyze small examples 
because the analyzer cannot yet load multiple class flies. Due to this, relevant 
parts of benchmark packages had to be merged into one class. The check for lock 
use in called methods was not yet implemented and was performed manually in 
trivial cases. These limitations will be overcome in the final version. 



Table 1. Comparison to other approaches: Number of warnings reported. 



Benchmark 


Size [LOG] 


PraunGross [18] 


FlanaganFreund [11] 


Block-local atomicity 


Elevator 


500 


2 


2 


2 


SOR 


250 


0 


0 


0 


TSP 


700 


1 


7 


1 



Besides a few hand-crafted examples used for unit testing, three benchmark 
applications [18] were analyzed: A discrete-event elevator simulator and two 
task-parallel applications, SOR (Successive Over-Relaxation over a 2D grid) and 
the Travelling Salesman Problem (TSP). Table I shows the results. The three 
warnings issued by all approaches are benign: In the elevator example, the two 
warnings refer to a case where a variable is checked twice, similarly to the exam- 
ple in Figure 6. For TSP, the warning refers to an access inside a constructor, 
where the data used is still thread-local. 

Our approach necessarily reports fewer atomicity violations than the run-time 
checker from Flanagan and Freund [II]. This can be expected since block-local 
atomicity implies method-local atomicity, and thus the number of violations 
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of method-local atomcity constitutes an upper bound to the number of block- 
local atomicity violations. It remains to be seen how much the reports from our 
algorithm will differ on larger benchmarks in comparison to [18]. 

Compared to a previous prototype checker for stale values [8], our checker 
is significantly faster. Burrows reported 2000 source lines (LOG) per minute on 
unspecified hardware. JNuke checked a binary resulting from 500 lines in 0.02 s, 
on a Pentium 4 with a clock frequency of 2.8 GHz. Accounting for different 
hardware, a difference of about two orders of magnitude remains. 



8 Conclusions and Future Work 

We have presented a data-flow-based algorithm to detect concurrency errors 
that cannot be detected by low-level [19] or high-level [3] data races. Previous 
atomicity-based approaches were entirely based on the atomicity of operation 
sequences, but ignored data flow between synchronized blocks [11,18,22]. This 
results in cases where correct non-atomic methods are reported as faulty. The 
algorithm presented in this paper detects stale values [8]. This conceptually 
simpler and more precise property captures data flow between synchronized 
blocks. The property can be checked in a thread-modular, method-local way. It 
can be implemented as a static analysis or as a run-time checking algorithm. 

Future work includes investigating the relationship to Burrows’ algorithm in 
more depth. Our algorithm currently issues a warning when a stale register is 
used even though the use of such a snapshot may be benign. Burrows’ more 
relaxed reporting could be more useful for pratical purposes. Extensions to the 
algorithm include coverage of thread-locality of data and higher-level segmen- 
tation [15] of events. It remains to be seen how easily the algorithm translates 
into a run-time implementation. A major challenge for a run-time analysis of 
this algorithm is the fact that each instruction has to be monitored, creating 
an impossibly large overhead for instrumentation-based analysis. However, our 
JNuke framework is capable of handling efficient low-level listeners that could 
make a run-time algorithm feasible [4] . A static analysis may still be preferable 
because aliasing information of locks can usually be approximated easily [2] . 



Acknowledgements. Thanks go to Ghristoph von Praun for the benchmark 
applications and for quickly answering questions about the nature of the atom- 
icity violations in them. 



References 

1. K. Arnold and J. Gosling. The Java Programming Language. Addison- Wesley, 
1996. 

2. C. Artho and A. Biere. Applying static analysis to large-scale, multithreaded Java 
programs. In D. Grant, editor, Proc. 13th ASWEC, Canberra, Australia, 2001. 
IEEE Computer Society. 




164 



C. Artho, K. Havelund, and A. Biere 



3. C. Artho, K. Havelund, and A. Biere. High-level data races. Journal on Software 
Testing, Verification & Reliability (STVR), 13(4), 2003. 

4. C. Artho, V. Schuppan, A. Biere, P. Eugster, M. Baur, and B. Zweimiiller. JNuke: 
efficient dynamic analysis for Java. In R. Alur and D. Peled, editors, Proc. CAV ’Of, 
Boston, USA, 2004. Springer. 

5. B. Blanchet. Escape analysis for object-oriented languages; application to Java. 
In Proc. OOPSLA ’99, pages 20-34, Denver, USA, 1999. ACM Press. 

6. B. Blanchet. Escape analysis for java, theory and practice. ACM Transactions on 
Programming Languages and Systems, 25(6):713-775, November 2003. 

7. J. Bogda and U. Holzle. Removing unnecessary synchronization in Java. In 
Proc. OOPSLA ’99, pages 35-46, Denver, USA, 1999. ACM Press. 

8. M. Burrows and R. Leino. Finding stale- value errors in concurrent programs. 
Technical Report SRC-TN-2002-004, Compaq SRC, Palo Alto, USA, 2002. 

9. J. Choi, M. Gupta, M. Serrano, V. Sreedhar, and S. Midkiff. Escape analysis for 
Java. In Proc. OOPSLA ’99, pages 1-19, Denver, USA, 1999. ACM Press. 

10. C. Flanagan. Verifying commit-atomicity using model-checking. In Proc. SPLN 
Workshop (SPIN’Of), volume 2989 of LNCS, Barcelona, Spain, 2004. Springer. 

11. C. Flanagan and S. Freund. Atomizer: a dynamic atomicity checker for multi- 
threaded programs. SIGPLAN Not., 39(l):256-267, 2004. 

12. C. Flanagan, R. Leino, M. Lillibridge, G. Nelson, J. Saxe, and R. Stata. Extended 
static checking for Java. In Proc. PLDI 2002, pages 234-245, Berlin, Germany, 
2002. ACM Press. 

13. C. Flanagan and S. Qadeer. Types for atomicity. In Proc. Workshop on Types in 
Language Design and Implementation (TLDI’03), New Orleans, USA, 2003. ACM 
Press. 

14. J. Gosling, B. Joy, G. Steele, and G. Bracha. The Java Virtual Machine Specifica- 
tion, Second Edition. Addison- Wesley, 2000. 

15. J. Harrow. Runtime checking of multithreaded applications with Visual Threads. 
In Proc. SPIN Workshop (SPIN’OO), volume 1885 of LNCS, Stanford, USA, 2000. 
Springer. 

16. D. Lea. Concurrent Programming in Java, Second Edition. Addison- Wesley, 1999. 

17. Richard J. Lipton. Reduction: a method of proving properties of parallel programs. 
Commun. ACM, 18(12):717-721, 1975. 

18. C.v. Praun and T. Gross. Static detection of atomicity violations in object-oriented 
programs. In Proc. Formal Techniques for Java-like Programs, volume 408 of 
Technical Reports from ETH Zurich. ETH Zurich, 2003. 

19. S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: a 
dynamic data race detector for multithreaded programs. ACM Trans, on Computer 
Systems, 15(4), 1997. 

20. W. Visser, K. Havelund, G. Brat, S. Park, and F. Lerda. Model checking programs. 
Automated Software Engineering Journal, 10(2), April 2003. 

21. C. von Praun and T. Gross. Object-race detection. In OOPSLA 2001, Tampa Bay, 
USA, 2001. AGM Press. 

22. L. Wang and S. Stoller. Run-time analysis for atomicity. In Proc. Run-Time 
Verification Workshop (RV’OS), volume 89(2) of ENTCS, Boulder, USA, 2003. 
Elsevier. 

23. J. Whaley and M. Rinard. Compositional pointer and escape analysis for Java 
programs. In Proc. OOPSLA ’99, pages 187-206, Denver, USA, 1999. ACM Press. 




Abstraction-Based Model Checking Using 
Heuristical Refinement 



Kairong Qian and Albert Nymeyer 



School of Computer Science & Engineering 
The University of New South Wales 
UNSW Sydney 2052 Australia 
{kairongq, anymeyer}@cse .unsw.edu. au 



Abstract. The major challenge in model checking for more than two 
decades has been dealing with the very large number of states that typ- 
ify industrial systems. Abstraction-based methods have been particu- 
larly successful in this regard. Heuristic-based methods that use domain 
knowledge to guide a model checker can also be effective in dealing with 
large systems. In this work, we present an abstraction and heuristic- 
based model checking algorithm (called Static Abstraction Guided model 
checking) that verifies the safety properties of a system. Unlike other 
abstraction-based approaches, this work proposes a model-checking al- 
gorithm that uses a sequence of abstract models as input, and a method 
to refine counterexamples to determine whether they are spurious or 
real. During this refinement, abstract models in the sequence are used as 
heuristics to guide the model checker. This tight integration of abstrac- 
tion and guidance is doubly effective in countering state explosion. This 
paper deals with the theoretical and algorithmic aspects of the approach 
only. 



1 Introduction 

Model checking can be used as a fully-automated technique to verify the sys- 
tem as a whole, as well as a debugging technique for system design. Ideally, 
model checkers should assist design engineers at different stages of design to 
detect defects quickly and accurately, as well as to verify the final system. In 
recent years, progress in model-checking research has led to a hope that versatile 
model checkers, capable of both verification and debugging, can be developed 
by combining formal methods and artificial intelligence techniques. While mod- 
ern model checkers can cope with quite large state spaces, the so-called “state 
space explosion” problem is still a major hurdle that prevents this technology 
from scaling up to industrial-size systems in general [1] . To combat the explosion 
in the number of states, many approaches have been proposed in recent years, 
including automata-based methods and symbolic model checking [2, 3, 1,4,5]. 

More recently, there has been growing interest [6,7,8] in the abstract interpre- 
tation framework, which has long been the method for static analysis, correctness 
proofs and code optimization etc. for logic programs [9]. The essence of this tech- 
nique is that the irrelevant information w.r.t. a certain set of properties can be 
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abstracted away from the original system, resulting in a much smaller system 
that preserves the given properties. However, during the course of abstraction, 
spurious defects may appear in the abstract system, where the spuriousness of 
these defects is a manifest of the loss of information. To determine whether a 
defect reported by such a model checker is spurious or not, one needs to fur- 
ther refine the abstract system (by adding more information) and then to model 
check again. The nature of the refinement is guided by the defect (in the form of 
the counterexample). This is often referred to as abstraction-refinement model 
checking approach [10,11,12,13,14]. 

Model checking can be seen as a search problem. The search goal, character- 
ized by a temporal logic formula, is a particular set of vertices or paths on the 
graph that represents the underlying semantic model of the system. Heuristic 
search algorithms such as A* have been used for decades to efficiently and heuris- 
tically find goal states on graphs that are too large for brute-force techniques. 
One can view model checking as a graph-search problem and use heuristic search 
techniques to avoid searching the entire state space ( [15,16,17,18,19,20,21,22]). 
However, the effectiveness of this approach is limited by the difficulty of finding 
informed heuristics for model-checking problems in general. In [22], we used a 
systematic method based on abstraction to extract useful heuristical informa- 
tion from the original model. In that work, abstraction was viewed as a form 
of system relaxation. In this work, we extend this idea further and use multiple 
abstractions to guide the search algorithm of the model checker. We argue that 
there should be a close relationship between the process of abstraction and the 
guided approach to model checking as the two techniques are complimentary. 
This work provides a framework that allows this to be realized. 

The rest of paper is structured as follows. In Section 2 we review related 
work, and in Section 3, we present our approach and highlight the differences 
with the abstraction-refinement approach. Basic model-checking and abstraction 
formalism is described in Section 4. In Section 5 we describe our approach in 
detail, provide algorithms, and prove some theoretical results. Implementation 
issues and conclusions are presented in Section 6 and 7 respectively. 



2 Related Work 



The well-known abstraction-refinement approach is depicted in Figure 1. Given 
a concrete system (model) and a property that is to be verified, one first con- 
structs an initial abstract model that preserves the property. The standard model 
checking procedure is then called to verify the abstract model w.r.t. the prop- 
erty. According to the property-preserving abstraction methodology, the model 
checker reports “property verified” if and only if the original model satisfies the 
property, or alternatively “property failed”, in which case it generates a coun- 
terexample. However, in the latter case, one cannot know whether there is a 
real error in the system or whether the error is spurious and caused by the 
abstraction. 
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Fig. 1. The abstraction-refinement approach 



Determining whether a failure is spurious or not is non-trivial as one needs 
to analyse the possibly very large concrete system. Clarke et. al [10,11] use 
a SAT solver to extract the concrete path that corresponds to the (abstract) 
counterexample. Glusman et. al [14] does likewise. If the counterexample is found 
to be spurious, the abstraction needs to be refined. The common technique for 
refinement is to analyze the spurious counterexample and add more information 
to the model, with the aim of trying to eliminate the counterexample [10,11, 
14,12]. A major difficulty in this approach is to separate (or split) an abstract 
state (and in so doing become less abstract). Unfortunately, this is provably an 
NP-complete problem [10]. In [11] techniques like integer liner programming and 
decision tree learning have been used to split abstract states. In [14], multiple 
counterexamples and SAT-based techniques have been used. Note that these 
techniques can be seen as adaptive methods that abstract the model dynamically 
during the abstraction-refinement process. The initial abstraction is constructed 
by the user: the refinement of abstractions is guided by counterexamples. 

3 Our Approach 

In contrast to the dynamic refinement approach described above, our approach 
is based on a sequence of abstractions that are statically determined. Dynami- 
cally, we do not refine the abstraction, but instead refine the counterexamples 
to determine whether they are spurious or not. 

Our approach is depicted in Figure 2. We refer to the approach as the Static 
Abstraction Guided (SAG) model checking algorithm. Notice that we have a 
sequence of abstractions on the left-hand side of the figure. We start with the 
initial abstraction Mi (the coarsest) and use a standard forward model checking 
algorithm [23] to compute all the paths through the state space that lead to an 
error state. The union of all the states in these error paths we call the error region 
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Fig. 2. The Static Abstraction Guided model checking algorithm 



as it must contain the error state(s), if there are any. It is feasible to determine 
all of the error paths because the size of the initial abstract model is relatively 
small. These error paths are abstract counterexamples, and are identical to those 
computed in the abstraction-refinement approach. However, in the abstraction- 
refinement approach, an error path is (only) used to provide assistance in refining 
the abstraction, and only a single counterexample is computed. 

In SAG, if we cannot find any error path in Mi, then we may conclude 
the model satisfies the safety property (we only consider property-preserving 
abstractions in this work). We then proceed to each of the finer levels of ab- 
straction M 2 ...M„. At each level, some (or even all) of the error paths may 
prove to be spurious. If at some level of abstraction we find that no error paths 
remain, then we have verified the complete concrete model, irrespective of the 
abstraction level that we are currently at. Alternatively, if there are still error 
paths remaining when we reach the concrete model, then we have found ‘real’, 
concrete counterexamples. 

The process of eliminating error paths is conducted using the heuristic search 
algorithm A*. As we step through the abstractions, we use the previous abstrac- 
tion as a heuristic for A*. In our scheme then, the abstraction and guided search 
approaches are closely entwined. In general terms, we differ from the abstraction- 
refinement approach in the following ways: 

1. Our approach uses a predefined set of abstractions. 

2. We compute all error paths in the (coarsest) abstract model, not just one. 

3. We do not refine the abstract model during model checking, but the error 
paths themselves, and eliminate those that are spurious. 

4. The heuristic search algorithm A*, which uses the courser abstract model as 
heuristic at each level, is used to speed up the search as well as to optimally 
compute the shortest error path. 
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4 Abstraction in Model Checking 

In this work, we use finite-state labelled transition systems (LTSs) as our un- 
derlying semantic model. To simplify the discussion, we modify the standard 
definition and add a set of states E, which is the set of erroneous states that vi- 
olate the safety property. As usual, let AP be a finite set of atomic propositions 
and assume -•p G AP iff p G AP. 

Definition 1 (LTS). A finite state labelled transition system is a 5-tuple M = 
{S, So, R, L, E), where 

— S is a finite set of states 

— So C S is a set of initial states 

— RCSxS is a transition relation 

— L C S X 2^^ is a labelling function that maps a state to a set of atomic 
propositions (labels) 

— E C S is the set of error states 

In computation tree logic (CTL), a safety property is expressed as AGp 
where ip only consists of atomic propositions, i.e. ip does not include temporal 
operators. The set E is defined to be {s|s G S' A s which is the set of 

“bad” states that violate the safety properties. Our definition of an LTS hence 
not only defines the system behaviours, but also captures the safety properties: 
the safety properties are bounded to the model. 

Definition 2 (Error Path). In an LTS M = (S, So, R, L, E), we say any path 
So, si, Si, Sn-i such that sq G So A (sj,Si+i) G i? A s„_i G E is an error 
path, denoted tt. 

Definition 3 (Safety Verification). Given an LTS M with a safety property 
AGip, M ^ AG(p iff there does NOT exist any error path tt in M. 

Let be a surjective function that maps a set of states S in M onto another 
set of states S such that |S| < |S|. 

Definition 4 (Abstraction). The abstraction of a concrete LTS M w.r.t. H 
is also an LTS M = (S, So, R, L, E), where 

— S = {s|s GSAs = H{s)} 

— So = {s|s gSAs = H{s)AsG So} 

— R C S X S is a transition relation, where (s'l, S 2 ) G R iff sd = H{si) A S 2 = 
H{s2( a 3si3s2(si, S 2 ) G R 

— L C S X 2^^ , where L{s) = (J L{si) for all Si = H~^{s) 

— E = {e|e G E Ae = H{e) A e G E} 

This definition essentially specifies a homomorphic abstraction. Note that 
this is very important as we shall use this abstraction as an admissible heuristic 
when we use A* to guide the search for error paths. It is proved in [22] that a 
homomorphic abstraction provides a lower bound estimate of the path length in 
the corresponding concrete system. 
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Definition 5 (Simulation Preorder). Let Mi and M2 he two LTSs. We say 
M2 simulates Mi, denoted Mi :< simulate M2, if there exists a homomorphic 
abstraction H that abstracts Mi to M2- 

Intuitively, Mi :< simulate M2 indicates that M2 can mimic all the behaviours 
of Ml, or more precisely, all concrete paths in Mi have corresponding abstract 
paths in M2- Note that in this definition of abstraction, we cannot guarantee 
every abstract path in M2 can be mapped back to a concrete path in Mi . This is 
why the homomorphic abstraction-based model checking paradigm induces false 
negatives [6,1]. 

5 The Model Checking Algorithm 

5.1 Formulating the Problem 

To specify the problem, we denote M = H{M) if M is the LTS constructed 
from M w.r.t. H, although by definition H only maps a concrete state space 
to an abstract one. The ultimate goal is of course to check whether a given 
model (concrete system), M, satisfies a given safety property AGip, i.e. whether 
M ^ AGip. Let EP(M) denote all error paths in M. Obviously, M ^ AGip iff 
EP(M) = (j). 

Let LIE = {Hi, H2, ..., Hn{ be n homomorphic mapping functions and M be 
a concrete LTS. We abstract the concrete system progressively as follows: 

= iJ„(M) 

M„_i = il„_i(M„) = H„_i{H„{M)) 

Ml = Hi{M2) = Hi{H2{...{H^{M))...)) 
where we denote the LTSs Mi by {S^,Sq,H,L^,E^). Thus, M ^simulate 

Mji Asimulate Mji—i Asimulate Asimulate Mi. We Call the Set 

{M, Mn, Mn-i, ■■■, Ml) the abstraction sequence w.r.t. HE, denoted as AS(M). 
We can now formulate the problem as follows: Given a concrete LTS M^ and an 
abstraction sequence AS(M), how can we determine whether EP(M) is empty 
without blindly searching the concrete LTS Ml 



5.2 How the Algorithm Works: A Running Example 

Before formally describing the algorithm, we illustrate how it works using a 
trivial example. In this example, we first show that the algorithm detects an error 
path in the concrete system. We then show the algorithm verifies the system if 
no error path exists. 

Consider the abstraction sequence shown in Figure 3. We represent a concrete 
system M using a d-bit vector {X0X1X2X3) where Xi is a Boolean variable. Let 

The safety property AGip under verification is embedded in M as the set E. 



1 
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Fig. 3. A running example 



(1101) be the (error) state that violates the safety property (double-circled state). 
The task is to detect the error path originating from (0000) that leads to the 
error state in M. We define a 2-level abstraction sequence as shown in the figure. 
At the coarsest abstract level, Mi, the third and fourth bits of the vector are 
made invisible, resulting in 4 states in total. We omit a state’s transition to itself 
as it need not contribute to finite error paths. As can be seen, the abstract error 
path, indicated by the dotted- line oval, is detected by exhaustive search in Mi. 
Next we refine this initial abstract initial error path to a less abstract level, M 2 . 
In M 2 , only the fourth bit is made invisible and the abstract system consists 
of 8 states. In searching for a more refined error path in M 2 , we make use of 
the abstract error path in Mi as a heuristic for the distance from a state to the 
abstract target state (IIOx). More precisely, our algorithm begins with (000a;) 
in M 2 , and only explores the successor whose corresponding state in Mi has the 
minimum distance to the final state of error path in Mi. Because the heuristic 
search algorithm A* is used, we only need to process the states in the dotted-line 
circle in M 2 A Once the abstract error in M 2 has been determined, we use the 
same method to refine it to the concrete level (the dotted-line circle in M). 



^ In the worst case, the algorithm is of course equivalent to breadth-first search. 
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5.3 Detecting Initial Abstract Error Paths 

Initially, we abstract away information from the concrete system to produce an 
initial abstraction that has fewer states, and identify the potential areas where 
the error state may reside, while keeping the model size manageable. In essence, 
we isolate that part of the state space of the initial abstraction that may contain 
error paths. We call this the error region. An algorithm to find the error region 
in Ml is shown below. 



Input: The most abstract LTS Mi = {S,So,R,L,E) 

Output: A set of abstract error paths, or NoEPFound 
Algorithm: FindEP 

SO: Initially, set Reachable = S'o and EP = {}; 

SI: Repeat until Reachable = S' or Reachable does not change: 

1. Compute all successors of the states in Reachable and add them to 
Reachable-, 

2. If Reachable f^E is not empty, extract the abstract error path and 
put it into EP-, 

S2: If EP is not empty, then return EP; otherwise return NoEPFound. 



This algorithm computes the set of error paths EP by doing a reachability 
analysis of Mi. The set EP constitutes the error region. The significant difference 
between this algorithm and standard fixed-point reachability algorithms is this 
algorithm detects and collects all error paths. This is essential because when 
entering the next, finer level of abstraction we do not want to examine the entire 
state space but only that part that contains an error path. If we missed an 
abstract error path this may result in concrete counterexamples being missed. 
In the abstract-refinement approach, all abstract error paths Mi are not required 
because only a single error path is used to guide the refinement, and blind search 
of the entire state space is used. 

We prove two theoretical results about this algorithm. The first one says 
that if there is no abstract error path found at the coarsest level of abstraction, 
then the concrete system is error-free (w.r.t. the given property). The second 
one states the converse: if there are concrete error paths then there will be 
corresponding abstract error paths. 

Theorem 1. If the algorithm FiudEP returns NoEPFouud, then M ^ AG(p. 

Proof sketch: The algorithm only returns NoEPFouud if no abstract error state 
is reachable from the initial state in the abstract system. This means that there 
is no abstract error path. By the definition of homomorphic abstractions, it is 
impossible to have a concrete error path. 
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Theorem 2. If there exists an error path in the concrete model M then the 
corresponding abstract error path in Mi must he included in EP returned by the 
algorithm FindEP. 

Proof sketch: The algorithm uses exhaustive forward search and will eventually 
reach the fixed-point as we are dealing with a finite state system. If the concrete 
system does have an error path, then by the definition of abstraction, it must 
have a corresponding abstract path in the abstract system, and this path will 
be detected by the exhaustive search. 

5.4 Refining Abstract Error Paths 

If the algorithm FindEP returns a set of abstract error paths EP = 
•••, TTfc}, we need to refine M\ to see which paths are spurious. The refine- 
ment utilizes the pre-defined abstraction sequence. The essence of the refinement 
is that the set EP of error paths is reduced by mapping the set to the next finer 
abstraction level and checking for spuriousness. The algorithm for refining a 
single abstract error path is shown below. 



Input: The single abstract error path tt = siS 2 ...Sfc G EP{Mi); two LTS’ Mi = 
(S', So, S, L, E) and M,+i = (S', S(„ S', S', E') such that M, = H{M,+i) 
Output: An error path in Mj+i, or NoEPFound 
Algorithm: RefineEP 

SO: 1. Set Reachable, Open ={} and Map = H~^{si)yj ...yj H~^{sk)', 

2. Set rank = 0 and states = Sg fl S“^(si); 

3. Set el = (rank, states) and put it into Open; 

SI: Repeat: 

1. Select the element el = (rank, states) in Open with the least rank, 

2. Set Reachable = Reachable U states; 

3. If states n E' is not empty, then return the error path in Mj+i; 

4. If Reachable does not change, then return NoEPFound; 

5. Compute all successors, next, of states; 

6. For each state s G next, set dg = distance from H{s) to s^, if s ^ 
Map, then d = oo; 

7. For each s G next, set newrank = rank -|- 1 -I- d; 

8. Create a new element (newrank, s) and put it to Open; 



When we refine the (initial) error region, we produce a set of error paths at 
a finer level of abstraction. This new set of paths are the shortest representa- 
tive paths that correspond to the previous (coarser) paths. The essence of path 
refinement in the algorithm RefineEP is that we map a path in the abstract 
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model Mi to a path in The states in this path are mapped to a set Map, 

which must contain the error path if it is not spurious. To determine the spuri- 
ousness we apply the A* algorithm (see fragment labelled SI in the algorithm) 
where the heuristic is SiS 2 ...Sfc, and the cost is the path length between the state 
Si and the error state Sk- Note that if we discover a state s that lies outside of 
the error region (i.e. Map), then this state is assigned a cost of oo. 

Theorem 3. If the algorithm RefineEP returns NoEPFound, then there is 
no concrete error path corresponding to the given abstract error path. 

Proof sketch: If the algorithm generates NoEPFound, we know that the refine- 
ment cannot progress to the last state of the abstract error path. As can be seen 
in Figure 4, there must exist some abstract state Si along the abstract path, such 
that all finer abstract states in the sequence characterized by Si are not strongly 
connected. As a result, the algorithm will cease forward progress when it reaches 
a fixed point (all states inside the dotted line in the figure). Thus, there is no 
less abstract error path and hence no concrete error path as well. 




Fig. 4. Non-strong connectivity inside abstract state 



Theorem 4. The algorithm RefineEP will return the shortest error path in 
Mj+i corresponding to the error path tt in Mi if one exists. 

Proof sketch: Because A* algorithm is used for the refinement, and the abstract 
error paths serve as lower-bound heuristics [22], the shortest error path is guar- 
anteed by admissibility. [24] 

5.5 The Static Abstraction Guided (SAG) Model Checking 
Algorithm 

The complete algorithm that verifies a model by using a statically determined 
abstraction sequence is shown below. 
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Input: The abstraction sequence AS{M) = {Mi, M 2 , ■■■, Mn)] 

Output: A set of concrete error paths ConcEP, or SAFE if the system is 
verified. 

Algorithm: SAG 

SO: 1. Set ConcEP = {} and pass M\ to FindEP; 

2. If FindEP returns NoEPFound, then return SAFE otherwise 
return EP; 

SI: For each tt G EP repeat: 

1. For i from 1 to n — 1 repeat: 

a) Pass TT, Mi and Mj+i to RefineEP 

b) If NoEPFound is returned, then go to SI; 

2. Put the concrete error path into ConcEP] 

S2: If ConcEP is not empty, then return ConcEP] otherwise return 
SAFE; 



The SAG algorithm first determines the error paths at the coarsest level of 
abstraction. Mi, by calling FindEP. It then repeatedly refines each error path 
in every abstract model in AS using RefineEP. Note that in the version of 
the algorithm above, we descend the abstraction sequence for each error path 
in EP. This is a depth- first approach, but we could of course also have used 
a breadth-first approach. If no error paths are found in the last model in the 
abstraction sequence which corresponds to the original, concrete model M , 
then SAFE is returned, otherwise a set of error paths. These error paths are 
concrete counterexamples. 

Theorem 5. If the SAG algorithm returns SAFE, then the concrete model 
does NOT contain an error path, and hence the model is verified. 

Proof sketch: The proof of this theorem follows immediately from the algorithms 
it calls: FindEP and RefineEP. If no abstract error path is found at the coars- 
est level of abstraction, then the system is verified according to Theorem 1. If 
there is no error path in the concrete system, all the refinements should return 
NoEPFound and we will verify the system. 

6 Implementation Issues 

An important implementation issue to be considered is how to deal with the 
abstraction sequence. Although the abstraction sequence is pre-defined, we do 
not need to construct all abstractions beforehand. We in fact only need to deter- 
mine how each abstraction in the sequence will be constructed. For example, we 
can define the abstraction sequence by simply defining the range hierarchy of all 
global variables that represent the state of a system. Thus, we only construct the 
abstraction when it is necessary. Furthermore, we do not have to construct the 
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entire state space at each level as we will have established that in some regions 
of the state space no errors can exist. 

Another key aspect of this work is the use of a heuristic search algorithm to re- 
fine the abstract counterexamples. Unfortunately, the conventional A* algorithm 
usually suffers from an exponential space problem, while its memory efficient ver- 
sion IDA*, sacrifices time for space. Recently, several memory-efficient A* al- 
gorithms have been proposed [16,25,26,21] . These use symbolic data structures 
called Binary Decision Diagrams (BDDs) to represent the state space. BDD- 
based heuristic search algorithms can potentially cope with very large problems. 
Moreover, BDDs can be used to represent the heuristics [22], so an A*-based 
model checking algorithm can be placed in a pure BDD setting. It is expected 
that a BDD-based SAG algorithm will provide the best performance. 

7 Conclusion and Future Work 

In this work we have presented a model checking framework that is based on 
both abstraction and heuristic search. This work extends earlier work [22,21] 
that demonstrated that an abstract model can be used as a heuristic in a model 
checking algorithm. 

Although abstraction-based techniques have long been the subject of model 
checking research, the integration of heuristic search and abstraction refinement 
sets this work apart. The abstraction-refinement approach uses counterexamples 
to guide model refinement, whereas our approach uses abstraction to guide the 
refinement of counterexamples. 

This approach is aimed at detecting errors in a system so it favours sys- 
tems that contains errors. The approach is hence most effective if used in the 
early stages of system design. An interesting future direction for this research 
would be to compare the ‘bug-finding’ ability of this method and the abstraction- 
refinement approach (for example). SAT-based techniques that help extract 
heuristic information from models would also be interesting, as well as making 
the abstraction sequence partially static and partially dynamic. The framework 
we develop in this paper only deals with the safety property of the system. 
Extending this to model check more general property classes is also possible. 
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Abstract. In the development of real-time (communicating) hardware 
or embedded-software systems, it is frequently the case that we want 
to refine/optimize the system’s internal behavior while preserving the 
external timed I/O behavior (that is, the interface protocol). In such a 
design refinement, modification of the systems’ internal branching struc- 
tures, as well as re-scheduling of internal actions, may frequently oc- 
cur. Our goal is, then, to ensure that such branch optimization and 
re-scheduling of internal actions preserve the systems’ external timed 
behavior, which is typically formalized by the notion of (timed) failure 
equivalence since it is less sensitive to the difference of internal branch- 
ing structures than (timed) weak bisimulation. In order to know the 
degree of freedom of such re-scheduling, parametric analysis is useful. 
The model suitable for such an analysis is a parametric time- interval au- 
tomaton(PTIA), which is a subset of a parametric timed automaton)!]. 
It has only a time interval with upper- and lower-bound parameters as a 
relative timing constraint between consecutive actions. In this paper, at 
first, we propose an abstraction algorithm of PTIA which preserves global 
timed bisimulation [2]. Global timed bisimulation is weaker than timed 
weak bisimulation and a sufficient condition for timed failure equiva- 
lence. Then, we also show that after applying our algorithm, the reduced 
PTIA has no internal actions, and thus the problem deriving a parameter 
condition in order that given two models are global timed bisimilar can 
be reduced to the existing parametric strong bisimulation equivalence 
checking[3j. We also apply our proposed equivalence checking algorithm 
to vulnerability checking for timing attack on web privacy. 



1 Introduction 

1.1 Purpose and Objective 

In recent years, an effective development methodology for hardware/embedded- 
software with real time constraints is desired. Precise implementation of timing 
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constraints for I/O behavior is becoming important not only in embedded sys- 
tems like mobile phones but also in infrastructure systems for transportation, 
medicine, finance and defense. But, as described in [4], it is almost impossible 
to verify the timing properties of real time systems by formal methods only. In 
this paper, we consider the following real time system development methodol- 
ogy: first, a skeleton code including only I/O actions with real time requirements 
as comments like Esterel with pragmas [5] is given; secondly, correctness of the 
I/O timing behavior in the skeleton code is verified by some heuristic method, 
as in [4]; and finally, refinement of the skeleton code for detailed implementa- 
tion is performed. In this methodology, it is important to verify the equivalence 
of I/O timing behavior between the initial design code and its refined code. 
We also take into consideration in the verification that branch restructuring of 
codes is performed in the refinement of the skeleton code. Moreover, it would 
be useful if we put real time constraints containing parameters (e.g. upper- 
/lower-bounds), and derive automatically the constraint (e.g. the minimum or 
maximum value allowed) of parameters in which the equivalence is preserved. 
Such an analysis is called a parametric analysis[l,6]. To capture the control flow 
with time constraints and perform a parametric analysis, we propose a para- 
metric time-interval automaton (PTIA), which is a subset of a parametric timed 
automaton)!] having only a time interval with upper- and lower-bound param- 
eters as a relative timing constraint between consecutive actions. We show that 
global timed bisimulation (GTB) equivalence checking for PTIAs can be re- 
duced to existing parametric strong timed bisimulation equivalence checking, 
where GTB is a weakening of timed weak bisimulation, in that internal branch 
structures are ignored. 



1.2 Related Work 

There are some proposals of parametric analyses for bisimulation equivalence. 
For bisimulation without time, parametric strong/weak bisimulation equivalence 
checking algorithms on STG (Symbolic Transition Graph) and STGA (STG with 
Assignment) are already proposed[7,8,9j. For timed strong bisimulation equiva- 
lence (bisimulation equivalence where both time and all actions are considered 
observable), parametric equivalence checking is proposed in [3]. However, for 
timed weak bisimulation equivalence (bisimulation equivalence where time is 
considered observable and internal actions are not considered observable) , as far 
as we know, parametric equivalence checking algorithm has not been proposed. 
As for research about real time software design methodology, in [5] Fsterel is 
extended to describe software with real time constraints given as comments and 
then timing properties are verified by model checker. 



1.3 Why Global Timed Bisimulation? 

In the development of real time software, several optimizations to meet real 
time requirements are usually done by using a profiler. In particular, branch 
restructuring plays an important role in the optimization. It is true that timed 
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weak bisimulation was proposed to determine equivalence of processes consid- 
ering both time and observability [10], but as pointed out by [2], timed weak 
bisimulation may not be suitable for equivalence checking of real time software 
in the presence of optimization via branch restructuring. Therefore we employ 
GTB to determine equivalence of processes, since GTB is a weakening of timed 
weak bisimulation in that internal branch structures are ignored. 



1.4 Brief Description of Proposed Method 

We take a skeleton code and a refined code, and convert them to a PTIA as 
an internal representation, where the skeleton code describes I/O behaviors and 
real time requirements between some of the I/O actions. Note that we also put 
a parametric timing constraint to each internal or I/O action which has no real 
time requirement. (Assigning some values to such parameters means giving some 
concrete scheduling of the behavior.) The refined code is designed by inserting 
detailed internal actions into the skeleton code and by dividing some real time 
constraints between I/O actions in the skeleton code into constraints between 
I/O and internal actions. In this translation, if input actions are described in 
branch or loop conditions, we delete such actions from branch and loop condi- 
tions while preserving real time constraints by inserting temporal variables, and 
then all branches are abstracted to non-deterministic ones. We extract branch 
and loop structure from a PTIA in order to merge a series of internal actions be- 
tween I/O actions along with control flow while preserving real time constraints. 
With this transformation, we can convert the original PTIA to a PTIA with- 
out internal actions, so that we can apply an existing parametric timed strong 
bisimulation equivalence checking algorithm to compare an initial skeleton code 
and a refined implementation code. 



1.5 Paper Organization 

In Section 2, we define the PTIA model and its operational semantics by defining 
a mapping from the model to a timed extension of labelled transition system 
(timed LTS). Section 3 describes the definition of global timed bisimulation 
on the timed LTS. We propose a transformation algorithm on the PTIA and 
prove that the transformation preserves global timed bisimulation equivalence 
in Section 4. In Section 5, we propose a global timed bisimulation equivalence 
checking algorithm and we apply it to vulnerability checking for timing attacks 
on Web privacy. Gonclusions and future directions are given in Section 6. 



2 Parametric Time-Interval Automata 

Let Act and V ar denote a set of actions and a set of variables, respectively. We 
denote the set of real-numbers by R and the set of non-negative real-numbers 
by R+ Let Intvl{V ar) denote a set of formulas of the form either el <t,t < e2, 
or el < t At < e2, where el and e2 are linear arithmetic expression (that is, only 
addition and subtraction are allowed) over variables in Var\ {t} and constants 
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in R, and t G V ar \s the special variable representing the elapsed time since the 
latest visit of the current control state. 

Definition 1 A parametric time-interval automaton is a tuple {S, {t}, 
PVar, E, Sinit) , where S is a finite set 0 / control states (also referred to as 
locations ), t G Var is the clock variable, PV ar C V ar is a finite set of param- 
eters, E C S X (Act U {r}) X Intvl(PVar) x S is a transition relation, Smit is 
the initial state. Note that r represents an internal action. On the other hand, 

every other aetion in Act represents an observable action. We write Si —I ^ Sj 
if (si,a,P,Sj) G E. □ 

. . a@?[P] 1 1 • 

iniormaiiy, a transition Si — > Sj means that the action a can be executed 
from Si when the values of both the clock variable t and parameters satisfy the 
formula P (called a guard condition), and after executed, the state moves into 
Sj and the clock variable t is reset to zero. In any state s, the value of the clock 
variable t increases continuously, representing the time passage. 

Formal semantics of parametric time-interval automata is similar to general 
parametric timed automata, which is defined as follows. The values of clocks 
and parameters are given by a function a : ({t} U PV ar) 1 — > R. We refer to such 
a function as a value- assignment. We represent a set of all value-assignments 
by Val. We write cr ^ P if a formula P G Intvl{V ar) is true under a value- 
assignment a G Val. The semantic behavior of a parametric timed automaton 
is given as a semantic transition system on eoncrete states. A concrete state is 
represented by (s, a), where s is a control state and cr is a value-assignment. Let 
CS {(s,(t)|s G S,a G Val} be a set of concrete states. The semantic model 
is a timed labelled transition system (timed LTS), which is defined as follows. A 
state of a timed LTS is a concrete state in CS. A transition of a timed LTS is 
either a delay -transition or an action-transition. A delay transition represents a 
time passage within the same control state s G S, whereas an action transition 
represents an execution of an action which changes the control state to the next 
one s' . Formally, a timed labelled transition system is defined as follows. 

Definition 2 A timed labelled transition system (a timed LTS for short) 
for a parametric time-interval automaton is a labelled transition system {CS, 
Act U R+ U {r}, CE, {smit, o’mitlt — >■ 0])), where a set of states is CS, a set of 
labels is Act U R+ U {r}, an initial state is {si„it, Cmitlt — >■ 0]), and a transition 
relation CE C CS x {Act U R+ U {r}) x CS is defined as the minimum set that 

satisfies the following eonditions (in the following, we write (s,a) — ^ (■s^o■') if 
((s,a),l,(s',a'))GCEj: 

— (s, a) — ^ (s, a -hv) ifvG R+, 

— (s, cr) — ^ (s', a[t — >■ 0]) if a G Act U {r}, s —I ^ s', and a \= P, 

where a -\- v and cf[t -G 0] are the value-assignments derived from a, which is 
defined as follows: 
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For X G PVar U {t} 
(cr + v){x) =‘ 

(aM 0])(x) ’ll* 



cr(a;) + V if X £ {i}, 
cr(x) otherwise. 

0 ifx£{t}, 
<j{x) otherwise. 



□ 



3 Global Timed Bisimulation 

In this section, we briefly recall the deflnition of global timed bisimulation (GTB) 
proposed in Ref. [ 2 ], as well as the deflnition of the traditional timed weak 
bisimulation[ll, 10 ] (TWB) and its relation to the GTB. 



3.1 Timed Weak Bisimulation 



In this section, we will briefly give the deflnition of timed weak bisimulation. 



Definition 3 A timed weak transition relation — on states of a timed LTS 
{CS, Act U R+ U {r}, CE, (sq, cto)) is defined as follows: 



1 . 

2. (s,a) (s', a') (v G R+) 

=^3wi,-u2,.-,Un G R+ [v = X)r=i^i 

Adsr , fJx , cr^ , S2 , ^2 , O 2 s.fi , cr^ , 
s.t. (s,a) (si,CTi) ^ (si,cr'i)' 

CTn) ) ^ ) ] 

o “ , / ^ /I def r “ , r 

o. y yj (a G Actj — w ^ 



(^ni ^n) 



□ 



By using this transition relation, timed weak bisimulation is defined as follows: 



Definition 4 A binary relation R on states of a timed LTS is a timed weak 
bisimulation if the following condition hold: 

If (si, fTi)i?(s2, (J 2 ), then for any a G Act U R+ U {t}, 



1 . 

2 . 



V (si,CTi) ^ 

3521^^2 [ ( 52 , 0 - 2 ) -^w ( 52 , 0 - 2 ) A (5'i,(j()i?(52, 
and, 






V S2,CT2[ (s2,(T2) (s2,CT2) ^ 

35'i,cr( [ ( 51 , 0 - 1 ) (5i,0-() A (5(,cr()i?(5'2,0-^) ] ] 



We say that states (51,0-1) and (52,0-2) are timed weak bisimulation equiv- 
alent, denoted by (51,0-1) =twb (52,0-2) if and only if there exists a timed weak 
bisimulation R such that (5i,cti) R ( 52 , 0 - 2 ). □ 
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( 1 , 0 -) 




-> 






(w,o-) 




Fig. 1. Simultaneous Choice of Internal Actions 



(I, O') 




(C'O) 




Fig. 2. Forward Resolution of Nondeterminism 



3.2 Global Timed Bisimulation 



Global timed bisimulation [2] is a variant of bisimulation equivalence which con- 
sider both timing and observability. Unlike timed weak bisimulation, global 
timed bisimulation does not distinguish the difference of branching structures of 
internal transitions. 

Firstly, in order to make it suitable for our formalism, we rephrase the defini- 
tion of a static generalized transition relation — >gt and a dynamic generalized 
transition relation -^gt, which are proposed in Ref. [2]. 

The intention of the following definition is that, from the original timed 
LTS, we want to construct a new timed LTS using — >gt and ~^gt such that the 
constructed LTS contains many other possible branching structures derived from 
the original one, which is practically indistinguishable by any external observer 
(that is, any realistic tester). 



Definition 5 Let OGT({s,a)) be the set of all outgoing transitions of (s,a), 
that is, the set of all transitions whose source state is (s, cr). A static generalized 
transition relation — >gt (also denoted by —^gt) and a dynamic generalized 
transition relation ~^gt on states of a timed LTS are defined as follows: 



1. preserving the timed weak transition relation 

a) If{s,a) {s', a'), then (s,a) — >gt (s', a'). 

b) For each a € Act U R+, if (s, a) -^w (s', <r'), then (s, a) ~^gt {s', cr'). 

2. simultaneous choice of internal transitions (see Fig. 1) 

IfOGT{{s,a)) = Ui6/{(s,cr) (Sa,,*,cr„,,i)} UUjgj{(s,CT) 

(ai G Act\Jli+ ), then (s,a) — )-gt (sneiu,cr) and for all j G J, (snew^cr) ~^gt 
{sj,Gj), where Snew is a newly introduced control state. 

3. forward resolution of nondeterminism (see Fig. 2) 

If OGT{{s,a)) = Ui6/{(s>cr) (sai,j,cr„,,i)} U Ugej{(s,cr) 

(ai G Act U R+ and a G Act), then for each j G J, (s,a) — >gt (. 



(sg,ag)} 
^new 1 ^ J } 
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A A ^ 



(s.tT) —>,,1 




Fig. 3. Forward Execution of an Internal Transition 




Fig. 4. Simultaneous Execution of an Observable Action 



(^new: O') 






>gt (sj,o-j), and {silw, a) -^gt for any i € I, where 

Snew is a newly introduced control state depending on the choice of j. 

4-. forward execution of an internal transition (see Fig. 3) 

If OGT{{s,a)) = (Sai.i,Cra,,i)} U {(s,Cr) A, (s/3,(T/3)} 

(ai € Act U R+ and /3 € Act U and if (s/j, cr/ 3 ) — >gt {s'^, a'l^) for some 

(3, then (s,cr) — >gt {snJw,<j), {snJw,cr) ~^gt and for any i € I, 

(sn^LjCr) ~^gt (sai,i , whcre slfjw is a newly introduced control state 
depending on the choice of ft. 

5. simultaneous execution of an observable action (see Fig. 4) 

IfOGT{{s,a)) = Ug/{(«’'^) (si)Cri)}uUjgj{(s,cr) A (sa,- j, ctc^ j)} 
(a € Act and aj € Act U R’^j, then for any non-empty subset I' C I, 
(s,a) ~^gt {snew.cf) and for any i' € F {sneL,cr) ~^gt (si>,ai>), where 
Snew is a newly introduced control state depending on the choice of I' . 

6. synchronized forward time passage of internal transitions (see Fig. 5) 



vye (1 «| 




Fig. 5. Synchronized Forward Time Passage of Internal Transitions 
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7- i , def / 

Let /■ gt — (“ 

V def ^1^ 

— ’^gt — ^gt 

V = Vi-\ \-Vn^ 



^gt)*, '^gt — >gt~^gt >gt for a 

^gt for V € if there exists some vi, . 



G Act, and 
. , Vn such that 



(Sai,*,cr„,,i)} UUj6j{(s,Cr) ^ {Sj,aj)} 

gt {sf\a^'"'’) for all j G 
~^gt for all 



IfOGT{{s,a)) = U 6 /{(s,a) - 
(at G Act U and if for v G R'^, {sj,aj) ==^ 

T then is a) AG , ) and ( 1 

u, ulcii yo^uj r gt yonew^^new) u,it,u yonew^^new) 

(v) 

j G J , where Snew is a newly introduced control state depending on the time 
value V, and anew is a value assignment depending on v, s, and a such that 
(s,a) (s,anew)- □ 

Using these transition relations, global timed bisimulation is defined as fol- 
lows. 



Definition 6 A binary relation R on states of a timed LTS is a global timed 
bisimulation if the following condition holds: 

If (si, ai)R{s2, CT2), then for any a G Act U R+ U {e}, 

1 . V s[,a[[ (si,CTi) -^gt {s'i,a[) ^ 

3 s'2 ,CT^ [ (52,0-2) =^gt (52,0-2) A (s'i,cr'i)i?(s'2,0-^) ] ], 
and, 

2 - V S 2 ,CT 2 [ (s 2 ,CT 2 ) -A^gt (S 2 ,a' 2 ) ^ 

35 'i,Ct( [ ( 51 , 0 - 1 ) =^gt ( 5 i, 0 -'i) A (s'i,cr'i)i?(s'2,0-^) ] ] 

lUe say that states (51,0-1) and (52,0-2) are global timed bisimulation equiva- 
lent, denoted by (si,cti) =gtb (52,0-2) if and only if there exists a global timed 
bisimulation R such that (si,cti) R (52,0-2). □ 

As concerned to the relationship between timed weak bisimulation and global 
timed bisimulation, the following proposition holds [ 2 ]. 

Proposition 1 For any two states (si,cti) and (52,0-2) of a timed LTS, if 
(51,0-1) =twb (52,^2), then (si,CTi) =gtb (52,^2). □ 



4 Abstraction Algorithm 

In this section, we propose some abstraction rules to eliminate internal actions 
of the PTIA, and show that their rules preserve global timed bisimulation equiv- 
alence. 

In the following, firstly, we define some preliminary notations in order to 
represent simply the proposed abstraction rules. Next, we describe the proposed 
abstraction rules. Finally, we show that the abstraction rules preserve global 
bisimulation equivalence. 

Note that in the discussion below, we do not distinguish a time constraint 
(a set of integer inequalities representing a time interval) and an interval on 
(non-negative) real numbers. 
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4.1 Preliminary Definitions 

Definition 7 For the set of all intervals on real-numbers X = {{t|ei < t < 
e2}|ei,e2 G U {oo}}, we define a binary operation 0 on IK as follows: 

For X = {t\xi <t< X 2 }, y = {t\yi < t < t/2} G X, 
x0y {t\3ti,t2 s.t. t = ti 12 A ti G X A t 2 & y} 

= {t\xi yi < t < X 2 3- V 2 } □ 



Now we define Pred {Xi U X 2 U • • • U X„|Xi, X 2 , X„ G K, n G N}, 
where N is the set of all natural numbers. The set union operator U is naturally 
considered as a binary operation on Pred. Thus, we extend the binary operator 
0 to the operator on Pred as follows: 

For any pred\,pred 2 G Pred, pred\0pred2 {t|,3ti,t2 s.t. t = t\-\-t 2 A 
ti G predi At 2 G pred 2 } 

Example 1 Suppose predi = x\Jy and p>red2 = z, where x = {t\x\ <t< X2}, 
y = {t\yi <t< 2/2}; z = {tl^i <t< Z2} G X. Then, we have 

pred\0pred2 = {t\3t\,t2 s.t. t = t\ 12 At\ G {x\Jy) At 2 G z} 

= ^2 S.t. t = t\ -\- 12 A (tr G X \/ t\ G y') A t 2 G 2 } 

= {t\3ti,t2 s.t. t = ti 12 A {ti G X A t 2 G z V ti G y A t 2 G z)} 

= {t\xi 3- Zi < t < X2 3- Z2 V yi 3- zi < t < y2 3- 22} 

= {tja:i + 2i < t < X2 + 22} U {t\yi + 21 < t < j/2 + 22} G Pred. □ 

By this discussion, we have the following theorem: 



Theorem 1 The binary operators 0 and U on Pred eonstitute a eommutative 
ring with the unit element of 1 {0} (the set eontaining only zero) and the 

zero element o/O =^0 (the empty set). That is, the followings hold: 

For any t,s,u G Pred, 



1 

2 

3 

4 

5 

6 



t0s = s0t 
t0{s0u) = (t0s)0U 
t01 = t 

(t U s)0u = t0u U S0U 
t\JQ = t, t0Q ^ 0 
The binary operator U 
(additive) unit element 



(commutative law) 
(associative law) 
(existence of the unit element) 
(distribution law) 
(existence of the zero element) 
on Pred constitutes a commutative group with the 
ofO. □ 



Definition 8 For any x = {t \ Xi < t < X 2 } G X, we define an unary 
operator C as follows: 

£(a;) G cc, A: G N U {0}} = {0} t < fcx 2 } G Pred. 

Moreover, in order to extend L to the operation on Pred, we define, 

for X = {t\xi < t < X 2 }, y = {t\yi < t < 2/2} G X, £{x U y) £(x)0£(y) G 

Pred. 

We refer to the operator £ as a loop operator. □ 

Intuitively, £{x) represents the set of all possible time consumed by a loop 
with real-time constraint x. From Theorem 1, the binary operator 0 on Pred 
is commutative, and the operator U is also commutative. Thus, the expression 
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Fig. 6. Abstraction for Sequential Strnctures 



C{x)OC{y) represents the set of possible time consumed by a loop with real- 
time constraint x and another loop with real-time constraint y. On the other 
hand, C{x U y) represents the set of time consumed by a loop with either real- 
time constraint x or y. This coincides what C{x)0C{y) represents. Thus we can 
safely apply the operator C without taking care of the applying order, that is, 
applying U on a; and y before applying £ is equivalent to applying L an. x and 
y independently before applying U. 



4.2 Abstraction Rules for Parametric Time-Interval Automata 

The abstraction rules are the followings: 

1. abstraction for sequential structures 

2. abstraction for loop structures 

3. abstraction for branching structures 



Abstraction for Sequential Structures. The abstraction for sequential 
structures is illustrated in Fig. 6. In Fig. 6, from the control state sq, the in- 
ternal transition r is executable at time t\ which satisfies the time constraint 
P = < X 2 }. Similarly, from the control state si, the observable 

transition a G Act is executable at time t 2 which satisfy the time constraint 
Q = {^ 2 |yi ^ t2 < 2 / 2 }- In such a case that consecutive multiple transitions 
contain an internal action r, we eliminate the internal transition and add an 
observable transition from sq to S 2 by the action a.. In this transformation, the 
time constraint is also modified into PQQ = {t\x\ + yi <t < X2 + 2 / 2 }- In case 
that there are some outgoing/incoming transitions on si, we make two copies 
Sj and s'l of the control state si and move all outgoing (incoming) transitions 
of Si into s'l (s'/, respectively). Furthermore, we copy every outgoing transition 
of Si to s'/, (see Fig. 7). Note that the equivalence between si and the copied 
state s'/ is clearly preserved since all the future transitions from si is preserved 
in s'/. As for s/, the equivalence is not preserved but s/ will be eliminated in the 
further application of this abstraction rule, while preserving the global timed 
bisimulation equivalence at sq. 
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Fig. 7. Abstraction for Sequential Structures in Presence of Outgoing/Incoming Tran- 
sitions 
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Fig. 8. Abstraction for Loop Structures 



Abstraction for Loop Structures. The abstraction for loop structures is 
illustrated in Fig. 8. In this abstraction, we focus on loops which contain internal 
transitions only^. 

In Fig. 8, there exists a loop which contains one internal transition whose time 
constraint is P = < X 2 }- We abstract the loop using the operator 

C defined in Definition 8. Specifically, we consider that the loop is executed for 
some k times. As a result, the loop and the next observable transition by an 
action a € Act with the time constraint Q = {t2|j/i < ^2 < 2/2} are abstracted to 
the observable transition by the action a with the time constraint C{P)OQ = 
{t\3k G N s.t. kxi + yi < t < kx 2 + 2/2}- 



Abstraction for Brauchiug Structures. The abstraction for branching 
structures is illustrated in Fig. 9. It is clear that any external observer can- 
not find which branch is selected if the observable action sequences and the 
corresponding time constraints are the same. Thus, we leave just one of these 
sequences^. If there are some outgoing/incoming transitions at some eliminated 

^ Note that we assume fairness on execution of unobservable (internal) loops, that is, 
we assume that any system executes such a loop for some finite number of times and 
eventually exits. 

^ Note that the abstraction rule for branching structures is essentially not required 
for eliminating internal actions — they can be eliminated only by using the abstrac- 
tion rules for sequantial and loop structures. The abstraction rule for branching 
structures is just for reducing the complexity (size) of the model. It is also possible 
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Fig. 9. Abstraction for Branching Structures 



control state, we move those transitions to the corresponding control state of 
another branch which is left. 

Formally, the abstraction rules are defined as follows: 

Definition 9 A set o/ Abstraction rules for PTIA is defined as a transformation 
function Abs on PTIA as follows: 



[sq where a G Act U {t}, 



, T@?t[p] a@?t[Q] , Abs 

1. /So > Si 1 52/ ^ 

P,Q (z Pred. 

If there are some other outgoing transition si then we create 

a new state s^ and transitions sq — i ^ s^ and s^ ^ before applying 

7@?t[i?2] 



this rule. Similarly, if there are some other incoming transition Sq 



Si at si, then we create a new state s" and transitions Sq 
s" g 2 before applying this rule. 

2. [so —A So A So —A Si A So —A S2 A • • • A so —A s„/ 



and 



Abs 



[so 

a„@?t[C{P)0Q 



aimt[C{P)0Qi 



a2®'?t[C{P)0Q2 



S2 A 

G Pred. 



Si A So 

So ^ Sn[, 

where ai,a 2 ,..., an G Act U {r}, P, Qi, Q 2 , <5r! 

. ai@?tfPi] Q:2@?t[P2] Q!2@?1[P3] an@?^[Pn] 

/So ^A Si —A S2 —A • • • —A S„ 

^ ^ P 2 ®n[Q 2 ] f32®-rt[Q3] p„®n[Q„] ^ ^ 

Abs r ai@?t[Pi] a 2 ®?t\P 2 ] Q 3 @?t[P 3 ] a„@?t[P„] , 

jso —A Si —A S2 —A ••• —A SnJ, 

where a\,a 2 , ...jan, Pi, [32, ■■■, I3n G Act\J{T} and ai = Pi and Pi = Qi for 
all i € {1, . . . , n}. 

If there are some other outgoing transition s' g" from s' (2 < i < n), 

then we replace it with Si s". The case of incoming transition to s' is 

similar. □ 



to eliminate branches having different time constraints by using the operator “U”. 
However, here we do not include such a rule since it does not generally reduce the 

essential complexity of the model (for example, s s' is always the same 



a®?t[Pi] , oQ?t[P 2 ] ,, 
>■ S A S >■ S ). 



as s 
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4.3 Correctness of Abstraction Rules 

In this section, we show that the abstraction rules defined in Section 4.2 preserve 
global timed bisimulation equivalence. 

We consider the abstraction function A6s() as the mapping from a PTIA M 
to the modified PTIA M' by applying one of the abstraction rules in Defini- 
tion 9. We also consider A6s() as the mapping from control states of M to the 
corresponding states of M' . We omit the proofs due to lack of space. 

Lemma 1. Ij s — > si — ^ S 2 , s — S 2 , and there are no other 
outgoing transitions from the control states s, s', and si, then for any a, (s, cr) 

—twb ■ 

(proof sketch) Routine from Definitions 4 and 7. □ 



Theorem 2 If s —4 ^ Si § 2 , then for any a, (s,a) =gtb {Abs{s),a). 

(proof sketch) If there are no outgoing transitions at si, the theorem im- 
mediately holds from Lemma 1 and Proposition 1. Suppose that some outgoing 

transitions Si ^ exists. From Definition 9, the path from s to s '2 is 

copied into s —I ^ s( and before applying the abstraction rule. 

Thus, from Lemma 1 and Definition 6, we can prove that global timed bisimu- 
lation is preserved. □ 



Since other cases are similar, here we only show the results. 



rxm 7-/- rmtlP] 

Theorem 3 Ij sq — ^ sq A sq — ^ si A sq 

So s„ , then for any a, (so,ct) =gtb (A6s(sq), cr). 



S2 A 



□ 



Theorem 4 If sq — r si 



So 



a2@?i[P2] Q2@?i[P3] , 

— ^ 52 — t • • • — t Sn and 

1 • • • Sn), then for any a, (sq, cr) =gtb 






(A&s(so),Cr). 



□ 



4.4 Terminating Property of Abstraction Algorithm 

Our proposed abstraction algorithm is to apply repeatedly the abstraction rules 
in Section 4.2 until no changes occur. In this section, we show that this abstrac- 
tion algorithm is ensured to terminate. 

Firstly, we define the abstraction algorithm more precisely. 

Definition 10 Abstraction Algorithm is defined as follows: 

1. Input PTIA M . 

2. Apply Abstraction Rule for Sequential Structures to M. 

3. Apply Abstraction Rule for Loop Structures to M . 
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4- Apply Abstraction Rule for Branching Structures to M. 

5. Repeat (2)- (4) until no changes occurred in M. 

6. Output PTIA M . □ 

Then, the following theorem holds. 

Theorem 5 For any PTIA M, there exists some natural number n such that 
Abs^{M) contains no internal transitions. Here, Abs’^(M) means the PTIA to 
which the abstraction rules are applied n times. 

(proof sketch) From Definition 9, it can be proven that the function A&s() 
generally monotonically decreases the number of internal transitions. Moreover, 
it can be shown that any internal transitions can be eliminated by the proposed 
abstraction rules, no matter which context it occurs in. From the fact above, we 
can prove the theorem. The detail of the proof is omitted due to lack of space. 

□ 



From this theorem, the following corollary immediately holds. 

Corollary 1 The abstraction algorithm in Definition 10 eventually terminates 
for any input M . □ 



5 Equivalence Checking 

In this section, we show that parametric global bisimulation equivalence checking 
on PTIA is reduced to parametric timed strong bisimulation checking on PTIA 
without internal transitions. We also show a method to check timing attack 
vulnerability for dense time models as an application of global timed bisimulation 
equivalence checking. 



5.1 Reducing Global Timed Bisimulation Checking into Timed 
Strong Bisimulation Checking 

By applying the algorithm of Definition 10 to two PTIAs M\ and M 2 , we ob- 
tain two PTIAs Abs{Mi) and Abs{M 2 ), which have no internal transitions and 
global timed bisimulation equivalent to Mi and M 2 , respectively. On the other 
hand, from the result of Ref. [3], we can obtain the parameter condition in or- 
der that Abs{Mi) and Abs{M 2 ) are timed strong bisimulation equivalent. Since 
timed strong bisimulation equivalence implies global timed bisimulation equiva- 
lence, and global timed bisimulation equivalence satisfies the transitive law, the 
obtained parameter condition is also the parameter condition in order that M\ 
and M 2 are global timed bisimulation equivalent. 

Definition 11 A binary relation R on states of a timed LTS is a timed strong 
bisimulation if the following condition hold: 
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If (si, (Ti)i?(s2, (J2), then for any a G Act U R+ U {t}, 



1 . 

2 . 



V (si,CTi) ^ 

34, [ (52,0-2) (52,0-2) A (4,Cr()i?(4,CTy ] ], 



and, 

V S2,CT2[ (s2,(T2) 
35 ' i , ct ( [ (51,0-1) 



( 4 , 0 -^) ^ 

(5l,0-'i) A (4,Cr'i)i?(4,CT^ ] ] 



We say that states (si,o-i) and (52,0-2) are timed strong bisimulation equiv- 
alent, denoted by (si,o-i) =tsb (52,0-2) if and only if there exists a timed strong 
bisimulation R such that (si,CTi) R (52,0-2). 



The following relationship holds among timed strong bisimulation, timed 
weak bisimulation, and global timed bisimulation. 

Proposition 2 For any two states (si,o-i) and (52,0-2) of a timed LTS, 
(si,o-i) =tsb (si,o-i) implies (51,0-1) =twb (51,0-1), and (51,0-1) =tsb (51,0-1) 
implies (si,ai) =gtb (si,ai). □ 

From the discussions above, the following theorem holds. 

Theorem 6 For any PTIAs Mi and M2, if there exists some natural numbers 
n and m such that A6s”(Mi) and Abs"^{M2) contain no internal transitions, 
and A6s”(Mi) =tsb Abs"^{M2) if and only if Mi =gtb M2- □ 



5.2 Application to Check Timing Attacks on Web Vulnerability 

A Timing Attack on the Web [12] is described as follows: (1). A malicious web site 
administrator puts an applet on his site. (2). When the applet is downloaded by 
a victim and executed, the applet enables the malicious site to observe whether 
a certain web site is registered to the victim’s web cache indicating a recent visit. 
Such an attack violates the privacy of the unwitting target. This history can be 
determined by focusing on the difference of response time between cache hit and 
miss for a certain web pass or web page file. 

According to [13], timing attack vulnerability can be checked by the following 
equivalence checking: 

For any P, {B\H) =twb {{B\\P)\H) holds, where B denotes an automa- 
ton that models the applet behavior running on victim’s web browser, 

P denotes an automaton that models the victim’s web cache, H denotes 
the set of actions that operate the web cache, B\\P denotes the parallel 
composition of B and P, and B\H denotes the same behavior as B, 
except that actions in H is considered unobservable. 

It is sufficient to check the case that P represents the cache where the web 
site concerned is always hit, as described in [13]. Since there are no proposals 
for checking timed weak bisimulation of timed processes under the dense-time 
domain, only checking under the discrete time domain is possible. However, the 
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real world is the dense-time domain. By using the global timed bisimulation 
checking algorithm proposed in this paper, we can check under the dense-time 
domain. 

However, in the proposed global timed bisimulation equivalence checking, the 
obtained parameter condition may contain a parameter variable representing the 
number of iterations of loop as a coefficient of a time parameter variable (used in 
the lower/upper bound of some real-time constraint). Thus, the parameter con- 
dition is not in a class of Presburger Arithmetic. Nonetheless, we can cope with 
this problem by performing the verification as follows: First, negate the derived 
parameter condition. Secondly, replace each product term containing a time pa- 
rameter variable and some loop iteration parameter variables as its coefficient 
with a new variable without a coefficient. Finally, by checking satisfiability of 
the obtained formula, we can check whether the given model is timing attack 
vulnerable, since satisfiability is preserved in this replacement. Therefore, our 
proposed method is applicable to timing attack vulnerability checking. 

6 Conclusion 

In this paper, we proposed a time-interval automaton and its transformation al- 
gorithm to eliminate internal actions while preserving global timed bisimulation, 
and showed that global timed bisimulation equivalence checking on time-interval 
automata can be reduced to the existing timed strong bisimulation equivalence 
checking method without internal transitions. We also showed that our proposed 
method can be applied to the timing attack vulnerability checking in the dense 
time domain setting. On the other hand, the proposed transformation algorithm 
may be applicable to the model abstraction technique for the parametric model 
checking of timed automata. Therefore, as future work, we are planning to show 
that some useful class of real-time temporal logic exists such that it cannot 
distinguish two global timed bisimilar models. It is also future work to cope 
with a time parameter variable that has a loop iteration parameter variable as 
a coefficient. 
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Abstract. Symbolic mo del- checking usually includes two steps: the 
building of a compact representation of a state graph and the evaluation 
of the properties of the system upon this data structure. In case of 
properties expressed with a linear time logic, it appears that the second 
step is often more time consuming than the first one. In this work, we 
present a mixed solution which builds an observation graph represented 
in a non symbolic way but where the nodes are essentially symbolic 
set of states. Due to the small number of events to be observed in 
a typical formula, this graph has a very moderate size and thus the 
complexity time of verification is neglectible w.r.t. the time to build the 
observation graph. Thus we propose different symbolic implementations 
for the construction of the nodes of this graph. The evaluations we have 
done on standard examples show that our method outperforms the pure 
symbolic methods which makes it attractive. 

Keywords: OBDD, Model Checking, Abstraction 



1 Introduction 

Checking properties of a dynamic system often leads to the building of a graph 
corresponding either to the state graph of the system or to some synchronized 
product of it with an automaton. In both cases, one has to tackle with the state 
explosion problem, i.e. the exponential increasing in the number of states w.r.t. 
the number of system’s components. 

Among the numerous techniques proposed to cope with such an explosion, 
the ordered binary decision diagrams (OBDDs) approach can be described as 
follows [1,2]. Each potential state is viewed as a vector of boolean variables by 
choosing the appropriate variables describing the system. Then the set of reach- 
able states is equivalent to the boolean function which returns true iff the input 
vector corresponds to a reachable state. The boolean expression associated with 
the function can now be represented in a compact way by factorising the multiple 
occurences of the same subexpression. Hence the final structure is a rooted di- 
rected acyclic graph (DAG) where the subgraph rooted at each node corresponds 

F. Wang (Ed.): ATVA 2004, LNCS 3299, pp. 196-210, 2004. 

© Springer- Verlag Berlin Heidelberg 2004 
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to a subexpression and the root corresponds to the function to be represented. 
In case of multiple functions there is one “root” per different function (here the 
structure is sometimes called shared OBDD) . 

The benefit of OBDDs comes from the fact that a small OBBD can often repre- 
sent a huge set of states, and the “symbolic” operations like the set operations 
(union, intersection, complementation) and the membership test are cheap as 
long as the OBDDs are small. Equally important are the operations associated 
to an event of the system and a set of states: the subset of states for which this 
event is enabled, the “image” of this set obtained by the occurrence of the event 
and the “preimage” of the set i.e. the set of states where the occurrence of this 
event leads to a state of the specified set. Generally these latter algorithms have 
a time complexity proportionnal to the size of the OBDD on which they are 
applied. 

Once the OBBD has been built, the reachability problem is straightforwardly 
answered by the membership test. However checking a temporal formula requires 
a more complex algorithm. It is shown in [3] that checking CTL and LTL for- 
mulae can be essentially reduced to the search of particular cycles in either the 
reachability graph or in some synchronized product. Thus the key point for an 
efficient OBDD-based formulae checking is the design of an OBDD-algorithm for 
the search of such cycles. The earliest algorithm is known as the Emerson-Lei 
algorithm [4]. Its worst case time complexity is quadratic w.r.t. the size of the 
reachability space (n). So different improvements have been proposed and ana- 
lyzed [5] . With a relaxed definition of a symbolic algorithm, the worst case time 
complexity has been first reduced to 0(nlog(n)) [6], then to 0(n) [7]. How- 
ever since these algorithms repeatedly build sets reduced to a singleton, their 
empirical complexity is often worse than the Emerson-Lei algorithm. In [3], the 
authors study variants of Emerson-Lei algorithm (e.g. CTY, OWCTY) with the 
same worst case time complexity but outperforming it on practical examples. 

In this paper, we study the checking of an evenemential linear time formula 
(formulas over events). We propose an hybrid method which builds: 

— a standard representation for the observation graph i.e. the abstraction of 
the reachability graph w.r.t. the events occurring in the formula, 

— an OBDD representation of each node of this graph which is indeed the 
closure of some set of states under the occurrences of unobserved events. 

Once this structure is obtained, a standard model-checking algorithm is ap- 
plied on the observation graph. Even in case of a huge reachability graph, the 
observation graph is quite small and the execution time of this last step is neg- 
ligible w.r.t. the execution time of the building of the observation graph. 

So we have paid attention to an efficient building of the observation graph. 
The critical factor is the “accumulated” size of the OBDD corresponding to the 
sets associated to each node of this graph. Our goal is to reduce both the number 
of the nodes of the graph and the size of the set of states associated to each node. 

— Two nodes corresponding to different sets of states can be regarded as equal if 
they have the same subset of states enabling observed events (thus the same 




198 



S. Haddad, J.-M. Hie, and K. Klai 



successors) and the same behavior w.r.t. the deadlock and the divergence 
properties. 

— Once a new node is built, we substitute to the corresponding set of states, 
the subset consisting of one representant per initial strongly connected com- 
ponent (SCO) of the subgraph spanned by this set. Indeed these subsets are 
sufficient to check equality between the original sets. We call this step the 
canonization. 

The evaluations on typical cases (see section 3) show that the size of this struc- 
ture is either of the same magnitude order of the size of the OBDD of the 
reachablity graph or even of a smaller order. More importantly the maximum 
size of the intermediate OBDDs is often much smaller than the corresponding 
size for the reachability graph. Thus w.r.t. the space complexity our method 
achieves its goal. 

The critical procedure w.r.t. the time complexity is the canonization step. 
We have observed that a standard symbolic search algorithm of initial SCCs of 
a graph (see for instance [8,6]) may have a bad time complexity. Thus we have 
developed a specific procedure which takes advantage of the parallelism of the 
system under observation. For this kind of systems, our procedure outperforms 
the standard procedure. 

The paper is organized as follows. In the second section, we briefly introduce 
the model-cheking problem and we develop our method describing and analyzing 
the main algorithms. In the third section, we evaluate our method on a bench- 
mark of problems. We also briefly discuss how to exhibit counter-examples when 
the formula is invalidated. At last, we conclude and give some perspectives to 
this work. 



2 The Observation Graph 

2.1 Model Checking of an Evenemential Linear Formula 

We introduce here the context of our model-checking problem. The system we 
want to study is given by: 

— A state description represented by a fixed vector of boolean variables. An 
initial state is associated to the system. 

— A finite set of events T. To each event is associated an identifier, an enabling 
predicate on states and a tranformation function. 

We suppose that the dynamics of the system can be symbolically evaluated by 
the following operations: Img{S, t) which returns the set of immediate successors 
of the states of S by the occurrence of the event t and Preimg{S, t) which returns 
the set of immediate predecessors of the states of S by the occurrence of the event 
t. We derive from these operations and the boolean ones, other useful operations 
with straightforward interpretations: 
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Img{S, T') = UteT' I’>ng{S, t) 

Preimg{S, T') = IJteT' Preimg{S, t) 

Enab{S, T') = S C\ Preimg{Img{S, T'),T') 

Enab{S,T') selects all states in S which have (at least) one successor by 
occurrence of some event in T' . 

Our method does not rely on the particular syntax and semantic of a temporal 
logic over events but we make two assumptions. The first hypothesis is that given 
the events T' occurring in the formula 4 >, the satisfaction of </> by a sequence of 
events a depends only on the observed sequence Proj{a, T'), the projection of a 
on the events of T' (the sequence obtained by removing, from a, all events not 
in T'). 

So our method builds a finite automaton called the observation graph which 
includes enough information to express three kinds of sequences of the system 
(i.e. three languages). 

— = {cr' = Proj{a,T') G T'°° | ct is an execution sequence} the infinite 
observable sequences, 

— L"^ax = = Pf’oj{a,T') I cr G T* is a maximal execution sequence} the 

finite maximal sequences, 

— = {a' = Proj{a,T') G T'* \a G is an execution sequence} the 
divergent sequences. 

Then the second hypothesis is that such a formula could be checked by the search 
for particular paths in some appropriate synchronized product between the ob- 
servation graph and an automaton related to the formula. Such an hypothesis 
is very standard and covers the case of formula languages like LTL or the linear 
time /r-calculus. 

In conclusion, by preserving the above three kinds of sequences, the obtained 
observation graph preserves the validity of formulas written in classical Manna- 
Pnueli linear time logic [9] {LTL) from which the ’’next operator” (X) has been 
removed (see [10,11]). This logic is extremely important in verification of con- 
current systems. Even if Manna-Pnueli linear time logic is state-based logic, 
interpretation of this logic in an event-based (or action-based) setting is possi- 
ble. This can be done in more than one way. An alternative interpretation that is 
perhaps more relevant for practical verification than the original one was given 
in [12] (more easily found in [13] pp. 498 — 499). 



2.2 Algorithms 

The algorithm BuildOG builds the (deterministic) observation graph related to 
an initial state so and a set of observable events Obs c T (t is the set of all 
events of the system). The data structures manipulated by the algorithm are 
the following ones: 

- a shared OBDD which contains symbolic representations of subsets of reach- 
ables sets. 
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Algorithm 2.1 Building of the observation graph 



1: BuildOG (state sq ; Events Obs) 

2: set vertex v^v'] 

3i Vertices V\ Edges E] 

4: Events Unobs — T\ Obs', stack sV, 

5i S' = Saturate({so}, 

v.dead = DetectDead(S^)^ 

7’. v.loop = DetectLoop(S'^ , C/no&s) ^ 

8i v.set = Reduce(S^ , C7nof)s) ^ 

9: V - {v}] E - 0; 

10: si.Push((i;, s'))’, 

11: repeat 

12: st.Pop((i!, 5)); 

13: for t e Obs do 

14: s' — Img{S,t)', 

15: if {s' ^ 0) then 

16: s' — Sa.tura.te{S' , Unobs)] 

17: 17^ .dead = DetectDead(Sb 5 

18: v' .loop — DetectLoop(S^, Unobs)] 

19: 17^ .set = Reduce(S^, t/no6s)^ 

20: if e Vs.t.w v') then 

21 : E^EU{v-L^w}] 

22: else 

23: V = VU{v'}] 

24: E ^ Eu {v-L^v'}] 

25: st. Push((i7^, Sbli 

26: end if 

27: end if 

28: end for 

29: until st 0; 



1: Saturate(set S, Events Unobs') 

2: set From, Reach, To] 

3: From = S] Reach — S] 

4: repeat 

5: To — Img{From, Unobs)] 

6: From — To \ Reach] 

7: Reach — Reach U To] 

8: until From 0; 

9: return Reach] 

1: DetectDead(set S) 

2: return Enab{S,T) ^ S] 

1: DetectLoop(set S, Events Unobs) 
2: set From, Reach] 

3: From = S] 

4: repeat 

5: Reach — Img{From, Unobs)] 

6: if Reach From then 

7: return TRUE] 

8: end if 

9: From — Reach] 

10: until Reach 0^ 

11: return FALSE] 



- a standard graph representation with a set of vertices (y) and a set of edges 
(e). Three attributes are associated to a node v. a symbolic subset of states 
v.set characterizing the behaviour of the system starting from this node, v.loop 
a boolean indicating that any sequence of observable events leading to this 
node is the projection of a divergent sequence and v.dead a boolean indicating 
that any sequence of observable events leading to this node is the projection 
of a finite maximal sequence, 

- a stack whose items are tuples composed by a node of the graph and a 
symbolic subset of states (the interpretation of this set is given below) . 

The initialization step of the algorithm (lines 5 - 9 ) allows to compute the first 
(initial) node of the observation graph (lines 5 — 8 ) and to initialize the graph 
structure (line 9 ) . An iteration of the main loop consists in picking and processing 
an item {v,S) of the stack until it is empty. The goal of the iteration is to 
generate the successors of the current node in the observation graph. The set of 
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states S corresponds to the states reached by any sequence of observed events 
leading from the initial node of the observation graph to v. Thus one successively 
computes the image S' of S by each observed event. If S' is not empty, it generates 
a new edge of the observation graph labelled by the event. 

Now one must check whether the node reached by this edge is a new one. 
So we compute the different attributes of this node. At first, we compute the 
closure of S' under the action of the unobserved events (via Saturate) . Then we 
check the existence of dead states with the help of the symbolic operation Enab 
(via DetectDead) and the existence of a loop of unobserved events by applying 
a kind of topologic sort of the underlying graph of S' (via DetectLoop). At last, 
we compute from S' a subset of states characterizing the observable behaviour 
starting from S' (via Reduce). 

Finally, we look for an identical node in the graph. If such a node is not 
present we add a new node in the graph and push it on the stack with S'. In 
fact, we could avoid to push S' and retrieve the significant information from v.set 
but this would complicate the presentation. 

We devote the next subsection to the presentation of Reduce since its ap- 
plications lead to important memory savings but may involve a large additional 
computation time. 

2.3 Canonization 

In this section, we detail the Reduce function which extracts from a set of states 
S, a subset sufficient to characterize the observed behaviour starting from S. 
At first, since S is closed under the action of the unobserved events, we can 
restrict S to the subset of states enabling any observed event. In practice, this 
first reduction does not lead to memory savings perhaps because such a set has a 
more irregular structure than the original one w.r.t. the OBDD representation. 
So we will not present evaluations of this variant. 

Taking the unobserved events as edges, S may be viewed as a graph and it 
is sufficient to extract one representant per initial SCC of this graph in order to 
preserve the observed behaviour starting from S. 

Whereas a symbolic search for all the SCCs of a graph is a theoretical issue 
(see [8,6,7]), the search of initial SCCs can easily be done in a number of oper- 
ations proportional to the number of states as shown by the algorithm SCAN. 
Each iteration of the external loop, starting from a single state Max, computes 
its forward closure F and then begins to compute its backward closure B. As 
soon as B is no more included in F we know that Max does not belong to an 
initial SCC, so we prune F from the current set R and we start a new iteration 
with a state which is a predecessor of Max {Maxpick extracts a singleton set re- 
duced to the maximal state of a set w.r.t. the lexicographic order induced by the 
variables of the OBDD). In the other case, B is an initial SCC including Max, so 
we add its representant to the set of representatives, we prune F and we start a 
new iteration with an arbitrary remaining state. 

Unfortunately, this algorithm has a bad empirical time complexity much 
greater than the complexity of the saturation. Thus we have designed a new 
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Algorithm 2.2 Standard and dichotomic canonization 


1 


SCAN (set S, Events Un) 


1: 


DCAN (set S, Events Un, int i) 


2 


set R, F, B, Frt, Max., Repr, Fred] 


2: 


set S[0..1], Front, Reach, Repr] 


3 


R — S’, Repr — 0^ Max — MaxPick{R)', 


3: 


5[1] — S n ite{xi., 1, 0); 


4 


repeat 


4: 


5[0] = S\S[1]; 


5 


F — Max', 


5: 


if (5[0] ^ 0) and (S[l] ^ 0) then 


6 


Frt — F\ 


6: 


Front — Reach — S'[l]; 


7 


repeat 


7: 


repeat 


8 


Frt — {Img{Frt, Un) C\ R) \ F\ 


8: 


Front — Img{Front, Un) \ Reach] 


9 


F ^ Frt U F’, 


9: 


Reach — Reach U Front] 


10 


until Frt == 0; 


10: 


S[0] — S[0] \ Front] 


11 


B = Max', 


11: 


until {Front —= 0) or (5[0] 0); 


12 


Frt = B; 


12: 


end if 


13 


repeat 


13: 


if (5[0] ^ 0) and (S[l] ^ 0) then 


14 


Frt — (Preimg{Frt, Un) C\ R) \ B', 


14: 


Front — iS[0]5 Reach — SfO]; 


15 


B ^ Frt U B; 


15: 


repeat 


16 


Pred — Frt \ F', 


16: 


Front — Img{Front, Un) \ Reach] 


17 


until {Frt —= 0) or {Pred ^ 0); 


17: 


Reach — Reach U Front] 


18 


R^ R\F] 


18: 


S[l] — S[l] \ Front] 


19 


if {Frt — — 0) then 


19: 


until {Front 0) or (5[1] 0); 


20 


Repr — Repr U MaxPick{B)’, 


20: 


end if 


21 


Max — MaxPick{R)', 


21: 


■i + +; Repr — 0; 


22 


else 


22: 


for j from 0 to 1 do 


23 


Max — MaxPick{Pred)] 


23: 


if size{S[j]) < 1 then 


24 


end if 


24: 


Repr — Repr U Sfj']; 


25 


until (K — 0); 


25: 


else 


26 


return Repr] 


26: 


Repr — Repr U DCAN(5[j], Un, i)] 






27: 


end if 






28: 


end for 






29: 


return Repr] 



algorithm DCAN adapted to the parallel systems. This recursive algorithm splits 
the set of states into two subsets (5[l] and 5[0]) w.r.t. the value of the current 
variable (xi, with i initially equal to 1). It prunes from the second subset all the 
states which are in the forward closure of the first subset. Such a deleted state 
either does not belong to an initial SCC or the representant of its SCC is in the 
first subset. Now 5[0] contains states not reachable from Sll]. We now eliminate 
the states of S[l] reachable from 5[0] since they do not belong to an initial SCC. 
After this double reduction, both these subsets may be independently analyzed 
in order to find the representatives of the initial SCCs. This leads to at most two 
recursive calls. Of course when a set is a singleton, we have found a representant. 

A last improvement is possible: an initial SCC obviously contains a state of 
S' in line 14 in the main algorithm before its saturation in line 16. Thus we can 
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restrict our search to the intersection of the backward and the forward closure 
(by the unobserved events) of S' before its saturation. 

Even if DCAN has a worse theoretical complexity than SCAN (quadratic 
versus linear), it is more convenient to parallel systems. To explain the former 
assertion, let us illustrate the time complexity (number of img and preimg ex- 
ecutions) of these two algorithms on the toy parallel program xi = true\x 2 = 
true\ . . . \xm = true with all the variables initialized to false. Depending on the 
number of variables, the state space of such a program can be illustrated by an 
hypercube. When m = 3, the state space is given by one of the two cubes of the 
figure depending on the encoding of true by 1 or 0. This would modify the initial 
choice of Max in the SCAN algorithm which has a time complexity either linear 
irn + 2) or exponential (2™+^) w.r.t. m, whereas DCAN has in both cases a linear 
complexity. We give below an informal proof of these claims. 



000 010 111 101 





Fig. 1. Possible State space graphs of a parallel program 



Let first consider the SCAN algorithm. 

- true = 1: The canonization is performed in as many iterations as the number 
of the cube nodes (2™^). At each step a maximum node is picked (ill, 110, 
101, . . . , 000 successively). By performing once img, leading to now successor, 
followed by one preimg operation, leading to some predecessors which are not 
successors, the picked node is removed from the whole set. The total number 
of img and preimg operations is hence 2 * n where n = 2*" is the number of the 
cube nodes. 

- true = 0: The canonization is here performed in the first iteration. By taking 
the maximal node ill, we perform three img steps to reach all cube nodes 
(plus one saturation execution), then one preimg is sufficient to know that 
this node has no predecessor. All successors are so moved and ill is selected 
as the representant of this elementary SCC. 

We consider now the DCAN algorithm: 
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- In both cases {true = 1 and true = o): The canonization is performed in 
m = 3 iterations. At each step the current state space is split into two equal 
subsets. At the first iteration, taking states where t>i has true value (i.e. 
nodes belonging to one face of the cube) leads to move in one img step all 
remaining nodes (i.e. all nodes belonging to the opposite face). The same 
thing is done in the two next iterations by removing in one step the n/2 
nodes from the n existing ones. In conclusion, if n is the number of the cube 
nodes, then we need to accomplish log 2 {n) = m img steps in order to canonize 
the hole space state. 

Note that when m > 3 the proof is identical. 

3 Evaluation of the Observation Method 

In this section, we report the results obtained with the observation graph com- 
pared to those obtained with a symbolic approach both for the building and the 
analysis of the state space. We also compare the two algorithms of canonization 
described in section 2. In general, time and memory consumption is very sensi- 
tive to the implementation details of the OBDD tool. Here, our goal is not to 
reach performances of existent tools [14,15], but we want to demonstrate that, 
given some software, the construction of the observation graph decreases time 
and space consumption with respect to an OBDD-based construction of some 
state space followed by a symbolic search of fair cycles. Therefore, we have de- 
veloped our algorithms with the free package BuDDy [16] for both the symbolic 
observation graph construction and the whole state space construction. The im- 
plementation of the Img and Preimg operations is similar to the one of [17]. All 
the tested examples are parameterized and the size of the reachable states space 
is exponential with respect to the parameter value. For each example, there 
are two observed events occurring in a linear time formula expressing a fairness 
property. 

The following three examples were used: 

1. Dining philosophers: The System consists of a ring of n dining philosophers. Each 
philosopher has two variables that model the state of his left and right forks (up 
or down). A philosopher first picks up his left fork, then his right, then, after he 
finishes eating, puts down his left, and finally his right, returning to his initial 
state. A fork can only be picked up if the neighbor that shares the fork is not 
using it. 

2. Distributed database: The System consists of a database distributed among n 
sites (each site has a copy of the database) . Its modifications are done in mutual 
exclusion. After such an operation, the site broadcasts its transaction. Upon re- 
ception, the other sites update their local copy and send back a grant message. 
When all the grants are received, the database is released. 

3. Slotted ring: The System models a protocol for Local Area Networks called slot- 
ted ring [17]. The network is composed of n vertices. Each node shares two events 
with his neighbor: Free(i+i) and Used^i+^^^^n- 
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Table 1. Experimental results 



n 


# 

st^ 

e 


OBDD 


SOG 


# 

vert 


# op 


CPU 

time 


# 

MS 


# 

edges 


# 

rep 


# 

op 


# vert 


Std. 

Canonis^ion 


Oich. 

Canonisation 


Const 

(see) 


Can 

(sec) 


Const 

(sec) 


Can 

(sec) 


2 


22 


56 


6 


0 


2 


2 


2 


13 


18 


0 


0 


0 


0 


5 


2 

103 


175 


13 


0 


2 


2 


2 


21 


39 


0 


0 


0 


0 


10 


4 

10« 


376 


26 


2 


2 


2 


2 


34 


74 


12 


10 


1 


0 


20 


2 

10*3 


775 


51 


51 


2 


2 


2 


59 


144 


3^ 


326 


30 


7 


30 


1 

10= 


1175 


76 


300 


2 


2 


2 


84 


214 


2578 


2425 


186 


33 


2 


7 


32 


4 


0 


2 


2 


2 


11 


21 


0 


0 


0 


0 


5 


406 


104 


10 


0 


2 


2 


2 


29 


51 


0 


0 


0 


0 


10 


1 

10* 


224 


20 


0 


2 


2 


2 


59 


101 


1 


0 


0 


0 


20 


2 

lO'o 


464 


40 


6 


2 


2 


2 


119 


201 


46 


33 


24 


11 


30 


2 

10'« 


704 


60 


35 


2 


2 


2 


179 


301 


278 


198 


150 


68 


2 


52 


72 


11 


0 


2 


3 


2 


28 


26 


0 


0 


0 


0 


S 


5 

10* 


340 


30 


1 


2 


3 


2 


83 


74 


7 


6 


4 


3 


10 


8 

10* 


1280 


115 


377 


2 


3 


2 


215 


154 






888.58 


759 



The table 1 includes performance results obtained for both the symbolic 
reachability space and the observation graph constructions. Its first column spec- 
ifies the system parameter (i.e. the number of components). The second one lists 
the size of the reachability space. The remaining columns are divided into subsets 
of columns corresponding to measurements for the standard OBDD algorithm 
and the observation graph. The comparison criteria are the number of symbolic 
operations, the CPU time and the number of OBDD vertices. In addition, we 
give for each observation graph the number of its vertices and edges and the 
sum of the sizes of the sets associated to each vertex (i.e. Evsize{v.set)). The last 
four columns compare the CPU time of the construction depending on the used 
canonization algorithm. 

The analysis of this table brings out the following statements: 

- The size of the OBDD associated to the observation graph is significantly 
smaller than the size of the OBDD associated to the reachability space (e.g. 
18% for 30 philosophers, 4% for 10 nodes in the slotted ring, 25% for 30 
copies of the database). 

- For the three examples, the size of the observation graph is independent of 
the value of the parameters. More generally, other experimentations have 
confirmed that the size of this graph is neglictible w.r.t. the OBDD size and 
grows slowly w.r.t. the parameter. 

- The computation of the initial SCCs leads to drastic differences depending 
on the use of SCAN or DCAN. The latter has a smaller time consumption 
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than the former (1% for 30 philosophers, 34% for 30 copies of the database). 
Its time consumption has the same order as the symbolic building of the 
reachability space. 

- There are more symbolic operations for the observation graph than for the 
reachability space. Combined with the previous statement, one concludes 
that the mean size of the manipulated OBBDs is smaller for the first one. 




# (tefdlon 



Fig. 2. Evolution of intermediary OBDDs sizes for 20 dining philosophers 



It is well known that the ratio between the maximal size of the OBDD during 
the computation and its final size may be important. So we have analyzed the 
size of these intermediary OBDDs. The figure 2 depicts the evolution of the 
intermediary OBDD sizes for the dining philosophers example. The first curve 
is related to the symbolic reachability space building while the two other ones 
correspond to the observation graph (one with two observed events and the other 
with four events). We point out that the size of the intermediary OBDDs are 
also reduced by our algorithm and depends on the number of observed events. 
Generally increasing the number of observed events leads to smaller intermediary 
OBDDs. 

Finally, we compare the complete verification process using the observation 
graph with a whole symbolic model-checking. We have chosen two robust and 
efficient symbolic algorithms: the Emerson-Lei and OWCTY algorithms [3]. The 
considered formula expresses that a philosopher will never indefinitely wait to 
eat. 
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Table 2. Model checking : SOG vs Emerson-Lei and OWCTY algorithms 
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The table 2 lists the measurements. Since the complete symbolic approach 
builds an OBDD representation of a synchronized product (thus different from 
the reachability space of the system) , we have taken into account space and time 
comparison criteria. 

First the construction of the observation graph consumes less CPU time 
and less memory than the construction of the symbolic synchronized product. 
Moreover, the CPU time consumed by the OWCTY algorithm (resp. Emerson- 
Lei) may be as large as (resp. two times larger than) the one of the whole state 
space construction. 

4 Search of a Counter Example 

Once a given property has been proven not to hold on a system, the modeller 
needs to modify its design. In order to help this process, providing a counter ex- 
ample is useful. It should be clear that the verification process based on the ob- 
servation graph exhibits a sequence of observed events invalidating the formula. 
Such a sequence a corresponds to a path in the observation graph. Sometimes, 
this is sufficient for the engineer to correct the design of its system. However, 
when the abstraction induced by the unobserved events is too strong, the mod- 
eller needs a sequence of the system whose projection on the observed events 
is (7. 

We first handle the case of a finite maximal sequence which corresponds to 
a path ■ ■ - Vn with Vn-deadlock = true. At first, we develop again the sets 

of states Si corresponding to the nodes Vi and we also memorize 5' the subset of 
Si corresponding to the set S' in line 14 of the main algorithm. We call S' the 
entry points of Si. 

Using DetectDead, One determines the set of dead states Sdead included in 
v„. We compute the backward closure of Sdead stacking the different fronts until 
we reach at least one entry point of S„, say Si„. Then we compute an explicit 
subsequence from leading to a dead state through the fronts which are popped 
from the stack. With the Preimg({si„}, o„)nS„_i operation, we obtain a transition 
with s„_i e S„_i. Iterating this backward process starting from s„_i 
we eventually reach so and build the searched sequence. 
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We now handle the case of an infinite observable sequence which corresponds 
to a path vo-^v\ — ^ ■ ■ • '^m • • ■ '^n Avith V771 — 'Vn- "\^^e detail the search of a circuit 
which passes a finite number of times (not necessarily 1) through Vm,vm+i, ■■ ■ ,v„. 
As shown on the figure 3, to the circuit Si-^Sm-^Si in the observation graph cor- 
responds for instance the circuit si,s5,s8,sn,s2,«6,s9.«i2,«3,si in the state graph. 

We explain our algorithm on this example. We encode sets of couples of 
states C = {{s, s'}} with an OBDD representation where each variable is dupli- 
cated. We transform the image operation as follows: Img{C,t) = {{s, s”)| 3 (s, s') e 
C and s'_L,s”j. In other words, the image operates on the right item. We handle 
a sequence of sets Co,Cq,Ci,C[, . . . where (s,s') e Ci if s' e Si is reachable from 
s e 5; by a sequence whose projection on the observed events is (0102)' and where 
(s, s') e C[ if s' e Sm is reachable from s e S; by a sequence whose projection on 
the observed events is (oio2)'oi. 

We start with Co = {{s,s)b s Si}. Then we iterate simultaneously on the Ci 
and the C' as follows. 

Co = C'o U Img(Co, Unobs) Ci+i = Ci+i U Img{Ci+i, Unobs) U Img{C[, 02) 

C[ = C' U Img(C[, Unobs) U ImgiCi, 01) 

We stop as soon as some Ci for i > 0 contains a couple (sioop,sioop)- Now we 
restrict the sets to items (stoop, s) and we project them on their second 

component obtaining sets that we denote Hj , H'. . Then we apply a construction 
for sloop similar to the one for through the sets . . . , 

At last, the construction of the prefix which leads from sq to sioop is identical 
to the construction of the sequence from so to sdead- 



S, 




Fig. 3 . Example of cyclic path in the symbolic observed graph 



The case of a divergent sequence is similarly handled except that we remain 
inside the set of states associated to the final node of the path of the observation 
graph. 




Design and Evaluation of a Symbolic and Abstraction-Based Model Checker 209 



Our algorithms are currently evaluated and the results will be presented in 
a forthcoming paper. However we anticipate that the critical factor will be the 
management of the sets of couples. 

5 Conclusion 

In this work, we have presented a new method for the symbolic model checking 
problem. This method builds an observation graph represented in a non sym- 
bolic way but where the nodes are essentially symbolic sets of states. Then a 
standard model checking is applied on this graph which usually has a very mod- 
erate size. Thus we have focused our work on efficient symbolic algorithms for 
subproblems involved in the construction of this graph. The evaluations we have 
done on standard examples have shown that our method outperforms the pure 
symbolic methods which makes it attractive. We pursue this work on different 
directions. At first, it is straightforward to adapt our method for a stuttering 
invariant propositional linear time logic. From a software point of view, we want 
to transform our prototype in a intermediate library which can be bound with 
different OBDD software packages and called by a verification framework. We 
have noticed that the performance of our method depends on the events to be 
observed much more than on the number of these events. Since the choice of 
a superset of these events is still possible, we want to investigate heuristics for 
this choice based on the structure of the model and more particularly on a Petri 
net model. At last, we are looking for some characterization of the parallel sys- 
tems for which our dichotomic symbolic search of initial SCCs of the state graph 
outperforms a standard symbolic search. 
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Abstract. The precise determination of worst-case execution times (WCETs) 
for programs is mostly being performed on fully linked executables, since all 
needed information is available and all machine parameters influencing cache 
performance are available to the analysis. This paper describes how to perform a 
component-wise prediction of the instruction cache behavior guaranteeing con- 
servative results compared to an analysis of a fully linked executable. This proves 
the correctness of the method based on a previous proof of correctness of the anal- 
ysis of fully linked executables. The analysis is described for a general A-way set 
associative cache. The only assumption is that the replacement strategy is LRU. 



1 Introduction 



Programs in hard real-time systems must not only deliver correct results, but also be 
guaranteed to terminate their tasks within their deadlines. All existing techniques for 
schedulability analysis require the worst-case-execution-time (WCETs) for each task in 
the system to be known. Caches, pipelines, and other performance-enhancing processor 
components make the determination of WCETs nontrivial. Sound cache-analysis tech- 
niques provide reliable information about the contents of the caches at all program points. 
This information helps to sharpen WCET estimation. Modern processors use instruction 
and data caches. In this paper we are interested in instruction caches only. So far, WCET- 
determination methods mostly work on fully linked executables, since all machine-level 
information about allocation are fixed and available. This paper presents a method for 
component-wise analysis of the cache behavior. It uses the notion of cache-equivalence 
of memory allocations to express that one allocation of a module in memory will display 
exactly the same cache behavior as an equivalent one. This equivalence is exploited 
to influence the linker, which can choose between several equivalent allocations when 
placing a module into the linked executable. The overall picture is the following: 
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CASE EXECUTION TIME (WCET) ANALYSIS, 2004. 

** Supported by the IMPRS (International Max-Planck Research School for Computer Science) 
* * * Research reported here is supported by the transregional research center AVACS (Automatic 
Verification and Analysis of Complex Systems) of the DFG (Deutsche Forschungsgemeinschaft) 

F. Wang (Ed.): ATVA 2004, LNCS 3299, pp. 21 1-229, 2004. 

0 Springer- Verlag Berlin Heidelberg 2004 




212 



A. Rakib et al. 



1 . A set of modules making up the real-time program is given. Cyclic calling depen- 
dences are assumed to only exist inside modules, not between modules. Thus, the 
module dependence graph representing inter-module calling relations is acyclic. 

2. A bottom-up module-wise analysis computes a sound approximation to the cache 
contents at all program points of all modules taking safe approximations of the cache 
damages of called external procedures into account. 

3 . The results of the module- wise analysis are conservative with respect to an in general 
more precise analysis of a fully linked executable. 

2 Cache Memory Architectures 

A Cache memory is a small fast memory located between the CPU and the main memory. 
The cache holds copies of the most frequently used memory blocks of instructions and 
data, leading to faster overall access. When the CPU wants to access a memory block at 
a certain address, it checks the cache to see if it is there. If it is present in the cache then 
the data is delivered from the cache, this is called a cache hit. If it is not in the cache it 
has to be brought from memory into the cache; this is called a cache miss. 

A few parameters determine the cache architecture: 

cache size s : Total size of the cache, i.e. the number of bytes it may contain. 

line size I : Number of contiguous bytes that are transferred from memory on a cache 
miss. A cache line can hold I contiguous bytes of memory contents called a memory 
block. 

number of cache lines n : s/l 

associativity A : Number of lines in the cache where a particular memory block can 
reside. 

The most general cache architecture is the A— way set associative cache : a memory 
block can reside in A different cache lines forming a set. The position of a memory block 
m with address a is in the set a%nj A (% denotes the modulo division), where nj A is 
the number of sets in the cache. Here a very important factor in determining how the 
cache is mapped to the system memory. There are three different ways that this mapping 
can usually be done and according to the mapping technique the cache is often named 
as direct-mapped cache, fully-associative cache, and A— way set associative cache. 

A special case for A = 1 is the direct-mapped cache : a memory block can reside 
in exactly one cache line. The position in the cache of a memory block m with address 
a is at a%L, where L is the number of lines in the cache. Another special case is the 
fully-associative cache : a memory block can reside in any cache location. 

Memory blocks compete for the same set For an A— way set associative cache of 
size A • 2" bytes with a line size of I = 2" bytes, we have A • 2““’^ cache lines and 
rj = 2"“" sets. Memory block address represented by m-bits can be divided according 
to Figure 1, the {u — v) bits index field is used to determine the set. Two memory blocks 
of the same “index” compete for the same set of A-cache lines. They are of the same 
“index” if their addresses differ by a multiple of 2("~“)+'“, i.e. of 2". 
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Fig. 1. Memory address division for cache mapping 



3 Cache Semantics 

We are considering the semantics of an A— way set associative cache with LRU re- 
placement strategy. Suppose it contains n/A-cache sets F = {/g, fi, / 2 , • . . , fn/A-i}^ 
each of which consists of A— cache lines L = {^g, li,l 2 , ■ ■ ■ , ^-i}, where n = cache 
size/line size. The store Mi = {mg, m \ , m|, . . . , of each object-module is divided 
into blocks of size cache line size so that one block can be transferred into one cache 
line. For all mj G Mi, the index i is used to recognize the corresponding module Mi 
and j represents location of m* in memory. The address function adr : Mi -A Ng 
determines the address of each memory block, and the function set : Mi -A- F de- 
termines the set, where a memory block will be stored in the cache. It is defined by 
set{rrdj) = fk ^ k = adr{mj)%(n/A). Each set contains A cache lines. If a cache 
line in a set does not contain any memory block then it is empty. We denote this by 
associating content I with it. Thus, U {/}. 

Definition 1. A concrete set state describes the contents of one set of a cache . It is 
mathematically defined as a function s : L ^ Mf where\/la,h G L : s{la) = s{lb) 
s{la) = s{lb) = I V la = h- S denotes the set of all concrete set states. 

A concrete cache state is a function c : F ^ S. C denotes the set of all concrete 
cache states. The initial cache state cj maps all cache lines to I. 

A concrete set update function Us '■ S x Mi — >■ S describes the new concrete set 
state for a given concrete set state and a referenced memory block. 

A concrete cache update function Uc '. C x Mi — >■ C describes the new concrete 
cache state for a given concrete cache state and a referenced memory block. It depends 
on the replacement strategy of the cache, cf. Sec. 3.1. 



Definition 2. An abstract set state is a function s : L ^ 2^*, where \/la,lb G L : 
Vmj G Mi : mj G (s(^a)ns(4)) ^ la = h- S denotes the set of all abstract set states. 

An abstract cache state is a function c : F ^ S. C denotes the set of all abstract 
cache states. 

An abstract set update function Ug : S x Mi -A S describes the new abstract set 
state for a given abstract set state and a referenced memory block. 

An abstract cache update function Uq : C x Mi -A C describes the new abstract 
cache state for a given abstract cache state and a referenced memory block. 
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A join function J \ C y. C ^ C combines two abstract cache states at a control 
flow Join. 



3.1 The LRU Replacement Strategy 

When there are invalid cache lines and a new memory block is copied into the cache, 
the memory block is placed into the first such line. In the case of all valid cache lines 
the least recently used (LRU) memory block will be replaced. The absence or presence 
of a memory block in the cache can be determined by their relative age according to the 
LRU replacement strategy, where the set of ages for an way set associative cache is 
A = {0,1, 2,..., A— 1}. The most recently used memory block has age 0 and the least 
recently used memory block has age A — 1 and we include an additional age T so the 
age set becomes ^ U T. If s{lp) = m* and nij ^ / for a concrete set state s, then 
p is the relative age of the memory block to* according to the LRU replacement strategy 
and not necessarily the physical position in the cache hardware. 

If a memory block to* associated with age a G Ait means memory block to* is in 
the cache with age a, but if it is associated with age T it means that it is not in the cache. 
If all cache lines are valid and the accessed memory block is not in the cache, put it into 
the cache with age 0 and all memory blocks in the cache age by 1, consequently the 
least recently used memory block simply removed from the cache. On the other hand if 
the accessed memory block is already in the cache with age a, its age is reset to 0 and 
ages of all younger memory blocks' are increase by 1 but ages of older memory blocks 
remain unchanged. 



4 Cache Analysis and Abstract Interpretation 

Abstract Interpretation [CC92] is a technique to statically analyze the dynamic properties 
of a program at compile time without actually executing it. It can be used in the context of 
cache behavior [Wil04], by computing invariants over the set of all concrete caches that 
may occur at a program point. These set of concrete caches at all program points are a part 
of the collecting semantics of the program. This collecting semantics is approximated 
from above by abstract cache states providing for safe predictions of cache contents. 
The program analyzer generator PAG [Mar98] is designed to support the implementation 
of abstract interpretation. 



5 The Analysis Framework 

5.1 Analysis Model 

As input to our analysis framework (c.f. Fig. 2), we have a set of object-code modules 
that are yet to be linked to form an executable. Additional user provided information 

* if rrij G s(L) and m{ £ s{^y) where y < x, then m\ is younger or equal to with respect 
to the relative age. 
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Fig. 2. Analysis structure 



about number of loop iterations, upper bound for recursion etc. are also assumed to 
be available. A parser reads the object code modules and reconstructs the control flow 
[TheOO] . The reconstructed control flow graphs are used as the input for microarchitecture 
analysis. The cache analysis classifies the references to main memory into cache hits, 
cache misses, or unclassified references. A reference classified as a cache hit indicates 
that all accesses to memory executed through this reference at run time will indeed hit 
the cache. Analogously, a reference classified as a cache miss indicates that all accesses 
to memory executed through this reference at run time will indeed miss the cache. If a 
reference could neither be classified as cache hit nor cache miss, then it is not classified. 



5.2 Module Representation by Control-Flow-Graph 

We represent each module hy control flow graph Cg = {B, E), consisting of nodes B 
and typed edges £. Nodes B represent basic blocks, i.e. a sequence of (one or more) 
consecutive instructions, starting with a unique entry instruction, ending with an exit 
instruction that leads to another basic block or ends the whole module. For each basic 
block it is known which memory blocks it references, i.e. there exist a mapping C : B ^ 
M* from control flow nodes to sequences of memory blocks. Initially all cache lines are 
invalid and referenced memory blocks will occupy cache lines according to the flow in 
the control flow graph. Cache will be updated using the update functions. 



5.3 Must Analysis 

To determine if a memory block is definitely in the cache an abstract set states is used, 
where the position of a memory block in the abstract set state s is an upper bound of the 
positions of the memory block in the concrete set states that s represents. The relative age 
of a memory block m* in a set can only be changed by references to memory block 
where set{mj) = set{m\), even relative age of m* will not he changed hy reference 
of memory blocks that are already in the same set and are younger or the same age as 
m;-. To determine a new abstract set state for a given abstract set sate and a referenced 
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Fig. 3. Update of an 4-way set associative must(sub-) cache 



memory block, the abstract set update function '■ x Mi is used, where 



'[lo {m}}, 

lx | X = 1, 2, . . . , h - 1, 

mM = < s’^{lh-i) U (s’^ilh) — {wj}), 

•5 ’ ^ ^ \ X = h+ 1,. A- 1]; if 3lh : m'^j e s'^Hh) 

[lo H> {m}}, 

Jx '->■ s'^(lx-i) I X = 1, 2, . . . , A — 1]; otherwise 

To describe the new abstract cache state for a given abstract cache state and a ref- 
erenced memory block, an abstract cache update function 1/3 : C'~' x Mi C"^ is 
used: 



= c^[sef(m*) i-A fY^(c^(sef(TO*)), to*)]. 

A (sub-) cache^ update is shown in Fig. 3. A join function x — >■ C"^ 

combines two abstract cache states at a control flow join. The concept of the join function 
is just like set intersection, join function of two abstract cache states will produces a state 
containing only memory blocks contained in both the states and the resulting age of a 
memory block will be the maximum of its ages in the operand states. If {ig, • ■ • , Ia-i} 
is the fully associative set of the cache with 0 < x < A — 1 and to* € s'^(lx) then to* 
will remain in the cache for at least (A — 1) — x cache updates that put a new element 
into the cache. 

The solution of the must analysis can be described as follows: Let c'^ be an abstract 
cache state at a control flow node k that references a memory block to* and s'^ = 
(P {set{mj)) . If TO* G s‘^((a;) for a set line P then to* is definitely in the cache. A 
reference to to* is classified as always hit (ah). 

^ For the sake of space we have shown it only for the set/o. 
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5.4 May Analysis 

It is dual to the must-analysis. To determine, if a memory block is never be in the cache, 
we compute the set of all memory blocks that may be in the cache. The abstract set state 

is used, where the position of a memory block in the abstract set state is the lower 
bound on the positions of the memory blocks in the concrete set sates that s'-' represents. 
The concept of join function is just like set union, two abstract cache states will produce 
a state containing only those memory blocks contained in at least one and the resulting 
age of a memory block will be the minimum of their ages in the operand states; for the 
details, we refer to [Fer97]. 

The solution of the may analysis can be described as follows: Let c'-' be an abstract 
cache state at a control flow node k that references a memory block m* and s'-' = 
c'-'(sef(TOp). If m* ^ s'^{lx) for an arbitrary set line l^ then m* is definitely be not in 
the cache. A reference to m* is classified as always miss (am). 



5.5 Classification of Memory References 

If the memory reference could neither be classified as ah nor am, then it is not classified 
and is abbreviated as nc [FHL+01]. 

( ah 

dass{C'^ ,C'^ ,m]) = < am 

nc 



5.6 Termination of the Analyses 

Since there are only finite number of sets and set lines and for each program a finite 
number of memory blocks, so that the domain of the abstract cache states c : F ^ {L ^ 
2^* ) is finite. Hence, every ascending chain is finite. Additionally, the abstract cache 
update functions are monotonic, and the join functions j are monotone, associative 

and commutative. This guarantees the termination of all the analyses [Fer97]. 



if : m* G tP{set{mj)){li); 
if$k : m* G c'-'(sef(mp)((i); 
otherwise. 



6 Object Modules and Linker 



It is more preferable that a program can be written as a collection of smaller source files, 
rather than one monolithic mass, i.e. decomposition of a large program into multiple 
modules is a prerequisite for carrying out large software projects. Several source files 
written by different people could be compiled separately and subsequently integrated 
to form a complete program. The integration process is carried out by a linker or link 
editor. Compilers and assemblers create object modules containing the generated binary 
code and data for a source file. A linker merges several object-modules (can take also 




218 



A. Rakib et al. 



required object modules from a library) into a single executable that can be loaded and 
executed. 

The object modules containing machine code and information for the linker are often 
divided up into many small logical segments that will be treated differently by the linker. 
Information for the linker comes in the form of symbol definitions, generally they are 
in two categories exported symbols and imported symbols. Exported symbols are func- 
tions or variables that are present in an object module and will be available for use by 
other object modules. Imported symbols are functions or variables that are called or ref- 
erenced by this object module but not internally defined. The linker resolves references 
to undefined symbols by finding out which other object module defines those symbols. 
Each object module has its own address space starting at address 0 (usually). We per- 
form cache analysis using available object module address space. How linker combines 
different sections of object modules to form an executable, relocation techniques and 
further details about object modules and linker can be found in [LevOO]. 



7 Notion of Equivalence 



Our aim is to ensure that the precomputed cache-behavior results obtained at module 
level can be combined in a conservative way with respect to the results of an analysis of a 
fully linked executable. The relative and absolute address spaces of a module before and 
after linking, respectively, will in general be different. Consequently, the cache behavior 
of the same memory block of a module may be different in two different address spaces. 
We perform cache analysis for each module using available relative address information 
and a fixed mapping of relative addresses of an object module to sets. Since the linker will 
place object modules in different address spaces during linking, the following subsections 
are mainly concerned with the condition under which the cache-behavior analysis results 
of one allocation in memory are conservative w.r.t. an other one. 



Ml 



M, 



M„ 



0 
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ki 

0 
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k2 




M, 



Mo 






linked executable 



object modules 



Fig. 4. Equivalent cache mapping 
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7.1 Equivalent Memory Allocation with Respect to the Fixed Set Mapping 

Suppose a set of modules Mi , M 2 , ■ ■ • , Mp forms a real-time program. A linker combines 
these modules in the sequence Mi, M 2 , . . . , Mp to form an executable. We assume that 
all object modules are created at address 0 during compilation. Consider the following 
stores of p object modules 



Ml = 

M2 = 

Mp = {ml, ml, ml,..., ml^}. 

According to the assumption, each TOq (1 < i < p) is mapped to the set /o, since 
adr{m}j) = 0 for all 1 < z < p. A question arises when we consider a fully linked exe- 
cutable of the whole program, whether the absolute address corresponding to a relative 
address will be mapped to the same set. Since linkers only shuffle segments of object 
modules but do not rearrange their internals, all the individual memory addresses used 
before become offsets from the new base address that the modules are placed into. So we 
are basically interested in the newly placed base addresses of all modules. Suppose the 
executable will be created at address q, where g = n • 2“ for some n € Z+. Henceforth 
we get, Ml = {Wq, mj_|_i, toJ_i_ 2 , . . . , ml+ki } the newly placed store of module Mi. 
Since adr{ml) — adr{ml) = n ■ 2“, is mapped to the set /o, cf. Sec. 2. Conse- 
quently, the absolute address corresponding to a relative address of module Mi will be 
mapped to the same set. Now consider modules M 2 , . . . , Mp. 

Let 

3 

Xj = ’^^{sizeof(Mi) + 6i), for all 1 < j < p — 1 (1) 

where the wasted space between modules Mi and M^+i for all 1 < z < p — 1 is 
calculated by 



ei = ^ (((P - 1) - i.adr{m\+,,^)%ri))%ri) (2) 

e* = ^ ■ (((P - 1) - , if 2 < i < p - 1 (3) 

In general, linker examines each module in turn and allocates storage sequentially, 
the starting address of a module is the sum of the lengths of its previous modules. But 
our aim is to place the modules in such a way that the base address of each module 
will be mapped to the set /q. Consequently, the base address of a module will not only 
depend on the sum of the lengths of its previous modules but also on the address of the 
last memory block of the preceding module. 

Figure 4 shows an example of equivalent memory address spaces with respect to the 
fixed set mapping, where the number of sets is four. We can see that the last memory 
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block ml^ of Ml maps to the set fi. Consequently, the base address of M 2 should be 
q + ki + 2 ■ I so that absolute base address of M 2 will be mapped to the set /q. The 
exact positions to place modules M 2 , ■ ■ ■ , Mp, are calculated using equation (1), which 
in turns depends on equations (2) and (3). 

Let us look at the addresses of the set of memory blocks 
M = each of which occupied the first position in 

the corresponding object module. We enforced restriction on the absolute base address 
of each module so that the relative and the absolute base addresses are offset by a 
multiple of 2“. By restricting q such that g = n • 2“ for some n G Z+, we have 
set{m^j) = fo for all G M. 



7.2 Conservative Cache-Behavior Analysis 

In this subsection we present the notion of precision of one analysis result w.r.t. the other. 
We consider only the must analysis, analogously it can be shown for the may analysis. 
First, we define the order of precision for two sets by: 

s? y lx G L, y rrij G Mi : nij G si] (lx) 3ly G L : nij G 

Si(ly) Ay < X. 

i.e. the abstract set state provides more precise cache information than that of S 2 
if and only if any memory block in S 2 with an associated age is also in with an age 
that is at most equal to the age in S 2 • 

Now we define the order of precision of the whole cache by: 

c? Qcnc]^y fkGF : C?(/fc) c](fk) 

According to the linking strategy in Sec. 7.1, the placements of object modules to 
the absolute address space can be defined as a function: 

P : {1, 2, . . . ,p} — >• No such that P = Xi.F 

where 6* is the absolute base address(in blocks) of Mi for all 1 < f < p. 

Let Tp : be the mapping, which maps an abstract cache state to an 

abstract cache state under the placement P, where 

fp{c]) = \fk.Xlx-{m] I adr{m)) = adr{ml) + P{i) : G Mi A ml G c]{fk){lx)} 

We choose for linking of the object modules placement P such that 
P % T] = 0 for all I < i < p. 



8 Proposed Analysis Method 

An inter-module dependency graph Dg = (V,E) (cf. Fig. 5(a)) consists of a set of 
vertices V representing modules and a set of call edges E representing module calls. 
The “module call” means actually the procedure call between modules. Our analysis 




Component-Wise Instruction-Cache Behavior Prediction 



221 



method is based on bottom-up approach, starting from maximal elements which are not 
dependent on any other module in the inter module dependency graph. Note that there 
are no cyclic dependencies between program modules. The basic idea of the model we 
present for module cache analysis is that at each stage (at each node) the produced results 
are conservative and the resulting cache information is kept in a module analysis result 
table Thii , so that analysis results of a module Mi will be available for its predecessor 
nodes (its calling modules). 




Fig. 5. (a)Inter-module dependency graph (b)Procedure call 



8.1 Module-Wise Cache Analysis 

A module may have a set of procedures and the cache behavior results of a particular 
procedure may only be needed at its calling module. We analyze cache behavior for each 
procedure separately. We consider the following two call contexts during the analysis of 
a module. 

- Local call: A call between two procedures is said to be a local call if both the 
procedures are in the same module. For example, the procedure foOj{) is calling to 
the procedure foo'j{) in module Mj, as depicted in Fig. 5(b). 

Suppose we analyze the cache behavior of foOj{). During the execution of the call 
instruction, the control will pass to the procedure foofj{) and instructions of /oo' () 
will be cached according to the flow in the control flow graph of /oo' (). In this case 
we do not use the resulting cache information of /oo' (), if any, at the calling point 
of fooji). 

- External call: A call between two procedures is said to be an external call if the callee 
and the caller procedures are in the different modules. For example, the procedure 
foOi{) G Mi is calling to the procedure foOj{) G Mj, as depicted in Fig. 5(b). 

In this case, at the calling point of foOi{) we use the resulting cache information of 
foOj{), if any. 
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According to the graph in Figure 5(a), we start cache analysis on module M„. After the 
analysis of module the data flow information is stored in the table 

and Mn is removed from the dependency graph, where CM„fooi{) represents the set 
of all abstract cache states for the procedure foOi{) in module Mn- ^M„fooi() is the 
information about the cache damage, which is defined in the very next subsection. When 
we analyze a procedure, initially we assume that each cache line in the must-cache(the 
cache on which the must analysis is performed) contains the T, whereas for the may- 
cache(the cache on which the may analysis is performed), initially we assume that 
everything may be in the cache with age 0. The cache analysis of Mi or Mm uses the 
pre-computed cache information of its called module M„, whenever needed at calling 
points. Since the cache content of calling module will be changed according to the 
called module’s cache information, one has to be very careful about the cache damages 
of called external procedures. In the following subsection we describe our approach, 
how to handle such cache effect during module call. 



8.2 Cache-Damage Analysis 



The aim of the cache-damage analysis is to provide the correct information about the 
bounds of replacements in a particular set. The basic idea is to determine upper-bound 
for the replacements occurring in a particular set for the must cache, whereas lower- 
bound for the number of replacements occurring in a particular set will be determined 
for the may cache. The domain of the cache-damage analysis is {D, C, |J, H, T, _L), 
where D = {0, . . . ,A} C N. The ordering relation C is simply the order < of natural 
numbers and lJ{m, n} = max(m, n), HIto, n} = min(TO, n) for m,n G D. 

We define an operation © on elements of D, by 



n © m = 



A , if n + m > A 
n-\- m , otherwise 



Definition 3. Let A = {a | a : f — >■ D}, where a determines the number of replace- 
ments in a set with respect to the concrete cache semantics. 

The concrete replacement update with respect to the concrete cache semantics is a 
function Aup : C x A x Mi — >■ A defined by 

Aup{c, a, TO*) = a' 



where 



a{fk) , ifset{m]) fk 

a{fk) , ifset{m]) = fk /\ ■ c{fk){lx) = m) 

o(/fe) © 1 , otherwise 



yfkGF: affk) 
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A concrete memory access update is a function up : C x Ax Mi C x A defined 
by 



up((c,a),mj) = {Uc{c,nij),Aup{c,a,m^j)) 



Definition 4. Let A = {a | a : — >■ D}, where a determines the number of replace- 

ments in a set with respect to the abstract cache semantics. 

We use the notation A'-' for the must cache. In this analysis we compute the upper 
bound for the replacements occurring in a particular set. The ordering relation in this 
case is 0 C 1 C ... A A, where 0 gives the most precise information and A gives the 
least precise information. We use the notation A'^ for the may cache and we compute 
the lower bound for the replacements occurring in a particular set. The ordering relation 
isACA— 1C...C0, where A gives the most precise information and 0 gives the 
least precise information. 

The abstract replacement update with respect to the abstract cache semantics is a 
function Aup ■. C x Ax Mi — >■ A defined by Aup{c, d, rrij) = a,' 

where 



V /fc e A : d'ifk) 



(d{fk) , if set(m*) /fc 

S a{fk) , if set{vA^ = fk : m) G c{fk){lx) 

[d(/fc) © 1 , otherwise 



An abstract memory access update function up : C x Ax Mi C x A determines 
the new pair (c, a) for a given pair (c, a) and a referenced memory block, where 

up((c,a),'tnj) = (lL(c,m^),Aup(c,d,tnj)). 



A cache damage join function Ajoin : {C x A) x {C x A) ^ {C x A) combines 
two^ data flow information at a control flow join, where 

Ajom((ci,di),(c2,d2)) = (ff(ci, 62), U{ai, 02}) 

For the must analysis where we determine the upper bound of replacements , we 
use U{ai,d 2 } = max(di, (£ 2 ) at a control flow join. For the may analysis we use 
□ { 01 , 02 } = min{di,d 2 ). 

A cache damage update function G : (C x A) x (C x A) — >■ (7 x A describes the 
new data flow information for two given data flow information at a calling pint. 

^ Our join functions are associative. If some node have more than two predecessors, the join 
function is applied iteratively. 
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8.3 Cache-Residue During the Module Call 

In this subsection we describe the cache residue of the calling procedure after returning 
the cache information of the callee. The callee procedure caches its own memory blocks 
which affects the abstract cache states of the caller. The cache information of the calling 
procedure after the call will be affected by the cache damage information of the callee 
procedure. 

Consider two modules Mi and Mj. Suppose a procedure foOi{) in Mi contains a 
call site which calls to the procedure foOj{) in Mj, where foOj{) is already analyzed. 
Consequently, the data flow information of foOj{) is available in the table 

Tmj = {(Cm./oo^-O, Amj/oo,-o) I foojO G Mj}. 

Let (co, So) be the data flow information of foOi{) preceding the call and (ct, at) be the 
data flow information of foOj{) at its return point. The cache damage update functions 
for the must and the may analyses during the external call are described in the following 
way: 

Let blocks{c}){fk)) = {rrij \ 31^ G L : m} G cj^(/fc)(^x)} be the set of all memory 
blocks of foOjO which are in the set fk of the must cache at the return point. 

The cache damage update for the must cache is defined as a function 

: (G^ X A'-') X (G^ x A'-') — >• G^ x A'-' such that G^((co , Sq ), (cf^, a^)) = 

where 

f [h '—>■ cj?(/fc)(fi) V z G [0, J — 1], 

\/fk G F : cP{fk) = < {cQ{fk){h-j) \ blocks{c'}'{fk))) V z G [j,A-l], 
[where j = a^(/fc)]; 

and a'-‘{fk) = So (/fc) ® S^(/fc). 

Let blocks{cYifk)) = {m} \ 31^ G L : m} G c^(/fc)(G)} be the set of all memory 
blocks of foOjO which are in the set fk of the may cache at the return point. 

The cache damage update for the may cache is defined as a function 

G^ : (G^ X A'^) X (G^ X A'^) — >• G^ x A'^ such that G^ ( (Co , Sq), (c^, S^^)) = (c^, S'^) 
where 

(ih^c'f{fk){h)yiG[o,j-i], 

V/fc G A : <F{fk) = <^ L ^ {c)!{fk){k-j) \ blocksicfifk))) U cfifk){k) V z G [j, A - 1], 
[where j = afifk)]’, 
and S'^(/fc) = a'^ifk) © S5^(/fc). 

In the module level analysis, we emphasize on external procedure calls from one 
module to another. A procedure foOj{) G Mj may be called from a loop of another 
procedure foOi{) G Mi (cf. Fig. 5(b)). Let us consider the must analysis. For each set 
fk, the basic idea of the cache damage update during external procedure call is that 
elements of fk of the caller are age by the upper bound of replacements in fk of the 
callee. The elements of fk of the callee retain their age during the cache damage update. 
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Let (cj^, a^) be the data flow information of procedure foOj{), available at its return 
point, and a memory block G c’^{fk){lx), where 0 < x < ^ — 1. If is 

the data flow information of foOi{) just after the first call to the procedure fooj (), then 
G c^{fk){lx)- Since foOj{) is called inside a loop, memory blocks of foOj{) may 
survive in the cache during all the non-first iterations of the loop in foOi{). 

Let a'^') be the data flow information of foOi{) preceding the second call and 
G ^'{fk){ly), where 0<x<y<2l— 1. \iy + a^{fk) < A — 1, then the memory 
block will survive in the cache during the second call. Consequently, if the data flow 
information of foOi{) just after the second call is then G (A'”{fk){lz) 

with z = y + a^ifk)- However, according to the analysis G (f'”{fk){lx)- Hence, 
there may exists multiple copies of in the same set with different ages. In order to 
avoid such situation we compute blocks{c^ {fk)) and flush all existing memory blocks 
of the callee from the cache during cache updates. 




Fig. 6. Cache-residue by taking safe approximations of the cache damages of called external 
procedure. 



8.4 Example 

A typical example of cache updates during module call considering the cache damage 
of the called external procedure is shown in Fig. 6. A procedure foOi{) of module Mi 
has a call to the procedure foOj{) of module Mj. The data flow information of foOj{) 
is already computed and is available at its return point. For simplicity, we consider one 
set /o with four cache lines, where 

- cJ^(/o) is the set of all abstract must cache states of foOj{) at the return point and 
the upper bound of replacements is determined as (/o) = 3. 
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- c^(/o) is the set of all abstract may cache states of foOj{) at the return point and 
the lower bound of replacements is determined as (/o) = 2 . 

- Cq (/o) is the set of all abstract must cache states of foOi{) preceding the call and 
the upper bound of replacements at this point is determined as Oq (/o) = 3. 

- Cq (/o) is the set of all abstract may cache states of foOi{) preceding the call and 
the lower bound of replacements at this point is determined as a^{fo) = 2 . 

The cache damage update functions and G'^ are used at the calling point which 
update the must and the may cache information as well as the upper and the lower bound 
for the replacements respectively. Where 

- (P{fo) is the set of all new abstract must cache states of foOi{) just after the call 
and the upper bound of replacements at this point is determined as a'^(/o) = 4. 

- (/o ) is the set of all new abstract may cache states of fooi ( ) just after the call and 
the lower bound of replacements at this point is determined as d'^(/o) = 4. 



9 Properties of the Method 



The focus of the analysis we discussed above was on how to analyze the cache behavior 
of a real-time program using module-wise approach. The analysis technique works in 
a bottom-up way taking safe approximations of the cache damages of called external 
procedures. Let a module M„ contains a set of procedures /ooi(), /002O, . . . , /oo„(). 
The following steps are followed during the analysis of module 

- Construct the control flow graph for each procedure footi). 

- Identify the local and the external calls. 

- Analyze all possible paths considering local and external calls of a procedure into 
account. 

- Update the cache according to the call contexts, taking safe approximations of the 
cache damages of called external procedures. 

- Store the cache analysis information for each procedure in the corresponding module 
analysis result table. 

The analysis result of the complete program is the composition of the analysis results 
of all the modules. We have the following properties of the method. 



9.1 Termination of the Cache-Damage Analysis 

Clearly the domain of the cache-damage analysis is finite. Hence, every ascending chain 
is finite. The abstract memory access update functions up are monotone because and 
Aup are monotone. Moreover, the join functions Ajoin are monotone, associative and 
commutative. This guarantees the termination of the analysis. 
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9.2 The Results of the Module-Wise Analysis Are Conservative 

In the module level analysis we start analysis from the maximal elements in the module 
dependency graph and gradually traverse the graph towards the module which contains 
the main procedure. During the cache analysis of each procedure, we start by assuming 
that all cache lines contains the T for the must cache and everything may be in the cache 
with age 0 for the may cache. Since s{lx) gives more precise information than T for 
an arbitrary cache line G L, the assumption of the must analysis is safe. The may 
analysis is used to determine, if a memory block never be in the cache, we compute 
the set of all memory blocks that may be in the cache. Therefore the assumption of the 
may cache also gives a safe approximation. From the discussion above, we can reach 
the following theorem: 

Theorem 1. The results of component-wise cache behavior prediction are conservative 
to an analysis results of a fully-linked executable. 

Proof Let V be the set of all procedures in the complete program. Consider the external 
dependency call graph {V' ,E), where the set of nodes V C Vis the set of all procedures 
which either contain a call site to other module or are called by some procedure from 
other module. The set E is the set of edges such that {pi,Pj) G E if and only if there is 
a call site in procedure pi G Mi which calls procedure pj G Mj such that i f j. Let 
and ti denotes the entry and the return point of procedure pi, respectively. Consider an 
arbitrary pathpi p 2 ^ ^ Pn in {V , E). We have to prove that at each node the 

cache behavior analysis results are conservative. Indeed, it is sufficient to prove that in 
an arbitrary calling (externally) point the produced cache effect results are conservative. 
Let (pi,pj) G E, where procedure pj is already analyzed and we are looking at the 
calling point of procedure pi. We consider only the must cache: 

Let.^n : C^xA^ be the effect of procedure pi where is constructed 

by applying monotone functions ,Ajoin sequentially according to the flow in 

the control flow graph and hence is monotone. 

Let (Cq , dg ) be the data flow information of pi just before call to the procedure pj. 
The dataflow information (c^, d^) = (T, 0) of procedure pj at its return point tj is 

already computed. In the case of a fully linked executable, the data flow information of 
Pi just after call to the procedure pj is {cQ' , d^') = El^. (Cq , d^ ). But, in our module level 
analysis, the dataflow information of pi just after call to the procedure pj is (c'^, d'^) = 

G"^((c?,d^),(cf^,dn). 

It remains to show that, cQ' Ccn Tp (c'^) (definition of Tp, cf. Sec. 7.2) 

To show this we have to show that V/^ G E : Cg'{fk) Tp(c'^)(/fc) 

We know from Sec. 7.2 that the memory blocks in the abstract sets are only compared 
to each other and are not influenced by the placements of the object modules. Let us 
look at the dataflow information (c^, d^) of procedure pj. For each set fk G F a safe 
upper bound of replacements af{fk) is computed, where d^ifk) > Oo^(/fc)- Because 
of the LRU replacement strategy, elements of the corresponding set fk of are age by 
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at most a^ifk)- Consequently, the lowest {A — {fk)) memory blocks of the set fk of 

Cq survive in the cache. It follows that, 

V nij £ Tp{c’^){fk){lx) ■■3ly £ L such that to} € c’^'{fk){ly) with y < x. 

Hence V/fc G F : c^'{fk) Ejn Tp(c^)(/fc) holds. 

Consequently Tp(c'^) gives the safe approximation of c[}'. A similar result can be 
proved for the may analysis. □ 



9.3 Maximum Wasted Memory Space 

We have seen in Sec. 7.1 that, it wastes some memory space during program execution 
because of the equivalent memory allocation w.r.t. the fixed set mapping. The memory 
space wasted between modules and is calculated by for all 1 < t < p — 1. 
In the worst case, if TOq and to}, both map to the set /o for all 1 < t < p, memory 
space wasted is at most (2“ — /) • (p — 1). 



10 Related Work 

Most of the research on precise cache-behavior prediction is being performed on fully 
linked executables. Kaustubh S. Patil did a master’s thesis [Pat03] on compositional 
static cache analysis using module-level abstraction. In his work he considered four 
types of analyses on each module. In our view, neither analysis ignoring call nor con- 
sidering call captures the proper cache effects during a module call. A research group 
at the Laboratory of Embedded Systems Innovation and Technology (LIT) described 
in [RGL02] a framework, PERF, which works with the object code generated by the 
integrated tools in order to determine execution-time limit estimations for functions that 
compose a real-time system. Their cache behavior prediction method is based on the 
extended timing schemata proposed by [SSea94,Sea94]. 



11 Conclusions and Future Work 

We have presented a technique for predicting the cache behavior for A-way set associative 
instruction caches component-wise. Given a set of object code-modules, a parser reads 
the object code-modules and reconstructs the control flow. The cache analysis technique 
works in a bottom-up way starting from maximal elements of the module dependency 
graph. The analysis computes a sound approximation to the cache contents at all program 
points of all modules taking safe approximations of the cache damages of called external 
functions into account. The analysis results can be combined in a conservative way with 
respect to an analysis of a fully linked executable. 

Our current research direction includes component-wise data cache behavior pre- 
diction. Data cache analysis is more difficult than instruction cache analysis, because 
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the effective data address may change when an instruction referencing data is executed 
repeatedly. We will implement a tool to estimate the worst-case execution time of a 
real-time system, where the system is given as a set of object code modules. 
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Abstract. The paper presents an approach to the translation validation 
of an optimizing compiler which translates synchronous C programs into 
machine code programs. Being synchronous means that both source and 
target programs are loop free. This enables representation of each of 
these programs by a single state transformer, and verification of the 
translation correctness is based on comparison of the source and target 
state transformers. The approach has been implemented on a tool called 
MCVT which is also described. 



1 Introduction 

In recent years there is a growing awareness of the importance of formally proving 
the correctness of safety critical portions of software systems. The industry is 
taking first steps towards use of formal methods for the validation of real systems. 
At the same time the importance of compiler optimizations is increasing, together 
with the development of modern CPU architectures. Consequently, the validity 
of the optimized code is not evident. This is the reason that standards and 
regulations are required to qualify a compiler, to be used in a safety critical 
system. In other cases, a manual check of the translated code is required. Dealing 
with commercial compiler poses new obstacles for the code validation effort. 
Intermediate representation is not available to the verifier and a major part 
of the knowledge about optimizations is confidential. The available compiler 
output is a standard object file format or assembly code, and standard debug 
information. However, when optimizations are activated, the debug information 
is only partially reliable. 

The research described in this paper is part of the SafeAir-II project of 1ST 
{Information Society Technologies program of the European Union), which aims 
towards the development of Advanced Design Tools for Safety Critical Software. 
The tool described here is a tool for translation validation of an optimizing com- 
piler. This tool accepts a source program and the corresponding target code pro- 
duced by the compiler and validates that the target code is a correct translation 
of the source program. The compiler that was chosen is WindRiver DiabData 

* This research was supported in part by the Minerva Center for Verification of Reac- 
tive Systems, 1ST project SafeAir-II, and NSF grant CCR-0205571 
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[Dia] compiler for the PowerPC [PPC02] family of processors, a compiler which 
is widely used in the development of safety-critical embedded systems. 

The major challenge of this research is to handle an industrial environ- 
ment, where the source language, the processor, the compiler as well as the 
applications, are all industrial grade. In order to be useful to the industrial 
sector, a fully automatic process is obligatory. The programs we handled 
were C programs, automatically produced from a synchronous language — 
SCADE [Ver99] used to design embedded systems. Synchronous languages, such 
as SCADE provide means to describe a reactive systems with finite or infinite 
computations. The C programs that are produced are composed of compilation 
units that have a special structure. Except for the main program which contains 
a single loop that repeats forever, all other units (procedures) and the body of the 
main program’s loop are practically loop-free. That is, the only nested loops they 
may contain have a fixed constant number of iterations (known at compile time). 
On the other hand, they may contain an arbitrarily complex conditional control 
structure. As a result, the produced machine code contains a significant number 
of conditional branches, and thus the number of possible paths through these 
programs is significantly high, though their length is bounded. These properties 
enable us to describe the effect of executing the program at each cycle (single 
iteration of the main program’s loop) by STS - State Transformer System. 

We propose a proof procedure which is capable of proving the translation 
correctness without previous knowledge about the optimizations performed by 
the compiler, and the compiler internal intermediate representation. When deal- 
ing with programs with more general control structure, e.g. non-trivial nested 
loops, the general approach to translation validation (e.g. [ZPFG02]) requires the 
construction of a control abstraction mapping as well as local (non-observable) 
variables mappings. In our more restricted context of synchronous control pro- 
grams, these constructions are unnecessary. Instead, we present a special algo- 
rithm (called Annotation) which, in a forward traversal of a loop-free program, 
constructs a state transformer which expresses all the possible data transforma- 
tions this program may effect in a single execution. 

The MCVT tool machine code validation tool described here, uses 
the proof procedure, together with the Annotation algorithm and additional 
heuristics, to automatically produce verification conditions. The verification con- 
ditions are formulas of decidable logic theories. MCVT uses CVC, a Cooperating 
Validity Checker [SBD02], developed at Stanford University, to establish the va- 
lidity of the verification conditions, and thus the correctness of the translation. 

Section 3 contains the theoretical framework, including the definitions of state 
transformer system, correctness of translation and a reduced procedure to prove 
it for synchronous programs. In Section 4 we introduce the source synchronous 
programs with their inherent restrictions, and present the syntax and seman- 
tics of synchronous C programs. Section 5 presents the syntax and semantics 
of synchronous machine code programs. Subsection 5.2 describes a linear time 
algorithm to construct the STS corresponding to a synchronous machine code 
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program. We conclude with a presentation of the MCVT tool, the heuristics and 
algorithms incorporated into it, and results of using the tool. 

2 Related Work 

Several methods were proposed in order to mechanically solve the translation 
validation problem. Most methods start by constructing a data abstraction map- 
ping. Then, either proof obligations or checkers, are produced. The proof obli- 
gations can be validated by an ’’off the shelf” validity checker, or by a theorem 
prover [ZG97,GZ99,GZG00,GZ00]. In other cases, the validation process is based 
on heuristics and, interaction between the heuristic part and the decision proce- 
dures, or the formulas’ simplifier [Nec00,RM00]. Yet another approach lets the 
compiler produce the proofs that are needed to establish its correctness [RMOO, 
GZOO]. 

The CVT tool [PSS98b,PSS99] is an example of an industrial quality transla- 
tion validation tool. It demonstrates an automatic procedure for validating the 
translation from the synchronous language SCADE to a sequential G program. 
The main concerns addressed by this tool are the semantic difference between a 
synchronous language and a sequential G code, as well as the data abstraction 
mapping. The proof obligations that are produced, include equalities and unin- 
terpreted functions, relying on the fact that the compiler does not change the 
order arithmetic functions are applied. The final validation is done by a decision 
procedure developed specially for the CVT tool. This paper can be considered as 
a continuation of the CVT project by validating the next link in the development 
sequence: from G programs produced by the SCADE tool to optimized machine 
code. 

The VOC project [ZPFG02] concentrates on the translation of G programs 
by an optimizing compiler, where the investigated optimizations are machine in- 
dependent. The language of both source and target systems is the intermediate 
representation of the compiler. Two different proof rules are presented: one which 
handles loop transformations and another for ” structure preserving” transforma- 
tions. The first proof rule deals with “structure preserving” transformations and 
is based on Floyd’s inductive-assertion method [Flo67]. The other proof rule 
deals with non-” structure preserving” transformations, which are permutations 
on the loop iterations’ order. The method we employ for our validation bears 
some resemblance to the structure preserving part. However, due to the special 
restrictions intrinsic to synchronous programs, we can employ a significantly 
simpler and more efficient approach which is described in this paper. 

Other works are done as part of the Verifix project [GZ99,GZG00]. These rely 
on the PVS theorem prover [GOR^95], in order to mechanize the verification, but 
full automation is not supported. One part of the Verifix project concentrates 
on compilers back-end. It formalizes the semantics of the assembly code, as well 
as of an intermediate representation, using high logics which are supported by 
PVS, whereas our goal is to use formulas whose validity is decidable. Another 
aspect of the Verifix project, which relates to our work, is the program checker 
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approach. This approach, when applied to compilers verification, is the same as 
translation validation, since a validation checker is produced for each translation. 
However, contrary to our work, the checker is produced by the compiler. In 
the work described in [RMOO], it is the compiler that produces the proofs, but 
the validity of the proof is established by the tool itself. The correctness proof 
consists of two sub-proofs: a sub-proof which shows that the analysis of the 
input program produces a correct result and, a sub-proof that establishes an 
equivalence between the original and the transformed programs. 

Another method is described in [NecOO] . It is based on heuristics and a special 
purpose simplifier. The latter is composed of inference rules and is built in a 
tool. The tool is able to validate translations automatically. It can handle a wide 
range of optimizations, but as reported, may fail for loop unrolling and does not 
handle software pipelining. This work also uses the intermediate representation 
and validates each optimization pass separately. 

While we choose to validate the translation, some works formally validate 
the translator itself. For example, this approach is used in [SSBOl], where ASM 
(Abstract State Machines) are used to define the semantics of the source (Java) 
and target (JVM) systems. Compiler formal specifications are provided as well 
as a proof of the correctness of this compiler with respect to the Java and JVM 
specifications. The compiler correctness proof is done manually. 

All the above mentioned works use compiler assistance, to some degree. It is a 
major challenge in translation validation research to minimize this dependence. 



3 State Transformers and Their Refinement 

To define the notion of correct translation between synchronous programs, we 
need to define the semantics of the source and target programs. To express the 
semantics of synchronous programs, we use State Transformer Systems (STS), 
which are a degenerate case of the synchronous transition systems introduced 
in [PSS98a]. 



3.1 State Transformer System 

A state transformer system S : {V, O, F) , consists of the following components: 

— V — a finite set of typed system variables. A type-consistent interpretation 
of the variables V is a system state. We denote by S the set of system states. 
For a state s £ S and variable v £V, we denote by s[w] the value assigned 
to V by s. 

— O C V — a set of observable variables. The observable variables are the 
variables which are visible to the external world. A state-observation is a 
projection of a state s G Sy on the observable variables and is denoted by 
sl)0. 

— F : F S — a State transformer, expressing the state transformation 
effected by the system. If the system variables are V = {ui, . . . , Vk}, then F 
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consists of the list of functions F\, . . . ,Fk, where each Fi maps F into the 
domain of Vi . We denote by Fq , the restriction of the list of functions to the 
functions corresponding to the observable variables. 

A computation of a STS is a pair a = (so,si) where si = F{sq). A pair of O- 
states oo,oi € FiJ-o is called an observation of S if there exists so,si € F, such 
that oq = soJJ-o, oi = Si JJ.Q, and (sq, si) is a computation of S. 

3.2 Transformer Composition 

In many cases, we have to compose two transformation functions. For example, 
we may have a concatenation of two statements Si; S 2 and the transformers 
Fsi, Fs 2 , corresponding to statements Si, S 2 , respectively. The transformer for 
the concatenation Si; S 2 will be given by the composition 

PsiA = Psi°Fs2 

In general, assume the system variables V = {wi, . . . ,Vm}, and the two trans- 
formers F^ ,F‘^ ■. F ^ F. Then the composition F^ o is defined by 

F^oF^ = XV.F‘^{F\V)) 

In particular, if F = o F^, then Fj{V) = Fj{F^{V)), for j = 1, . . . ,m. 

For example, assume that V = {a,b}, statement is the assignment 
b := b+1 and statement S 2 is the multiple assignment (a, b) := {2 * b,a + b + 1). 
Then the respective transformers are given by 

Fsi '■ {a,b+ 1) 

F§2 ■ (2*6, a-|-6-|-l) 

Psi-,S 2 = Fsi o Fs 2 ■■ (2(6 + 1), a + 6 + 2) 



3.3 Refinement Between STS’s 

A correct translation needs only to ensure the preservation of the observable 
behavior [ZG97]. A target program is a correct translation of a source system 
program, if every observation that exists in the target system, also exists in 
the source system. This characteristic is formalized by the notion of refinement 
[EdR+99]. Let = {V^ , O'®, F®) and 5®’ = {V'^, O®’, F^), be source and target 
STS’s. Let R C F®lJ.o xF^’U-o be a relation between source state observations 
and target state observations. We say that 5®’ is an R-refinement of 5® if, for 
every source and target states s G F®, t G F^, 

implies F(F^(s), Fg’(t)) 

Equivalently, for every si,S 2 G F® and ti,t 2 G F^, such that S 2 = F®(si) and 
t 2 = F^(ti), if the observable parts of Si,ti are F-related, then so are the ob- 
servable parts of S 2 ,t 2 - 

System is a refinement of 5® if there exists a refinement relation R, such 
that 5®’ is an R-refinement of 5®. 
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3.4 Verifying the Correctness of Translation — The compare 
Procedure 

Let : {V^ , , F^) and , F'^) be a source and the target STS’a, 

respectively. We assume that the observable variables of the two systems are 
: {V\, . . . , 14 }, : {v\, . . . , Vk], and that there exists a 1 — 1 correspon- 

dence between these variables. Furthermore, we will restrict our attention to the 
case that the refinement mapping R is given by the conjunction of equalities 
V\ = v\ A • • • A Vk = Vk- Under these assumptions, we can use the following 
procedure compare to verify that 5^ is a refinement of S^: 

1. Construct a data mapping a \ V\ = v\ /\ . . .Vi = Vi, . . . ,Vk = Vk- 

2. For each observable variable of the source system Vi i = 1, . . . , k, construct 
a verification condition V Ci : 

VCr- a ^ Ff{V^) = Ff{V^) 

3. Establish the validity of all the verification conditions. 

4 The Source Synchronous Programs 

The source programs we handle are produced from a high-level synchronous 
language SCADE [Ver99]. In this language, a node is a network of operators. A 
node has a formal interface (input and output variables) and local variables. The 
formal interface defines the observable part of the node. The node represents a 
computation of the values of the output variables as functions of the input vari- 
ables. Each node has its own distinct output. A SCADE program is a collection of 
nodes, which are activated by an external loop. The input variables values that 
are used in each iteration, are the values that were computed in the previous 
iteration. Thus the order of running the nodes has no effect on the output of the 
program. Each node is translated into one compilation unit of the C language. 
The C code of the different nodes is compiled into a machine code. The code 
of each node usually contains no loops. In rare cases, it may contain loops with 
constant number of iterations which can easily be unrolled. Our notion of syn- 
chronous program relates to a program that represents one node, thus containing 
no loops. We define two types of synchronous programs: synchronous machine 
code programs, and synchronous C programs. 



4.1 Synchronous C Programs 

Synchronous C programs are C programs without loops. The following state- 
ments are not allowed: while, for, do-while and goto. The syntax of synchronous 
C programs is defined as follows: 

Let V = {?;i, . . . , Vm} be a set of integer variables, then 

— For variable Vi € V and expression E over V, “vi := E” is an assignment 
statement. 
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— For boolean expression C and statements Si, S 2 , ”if C then S'! else S 2 ” 
is a conditional statement. 

— If 51,... ,Sk are statements then ”5i; ... ; 5^” is a concatenation 
statement. 

Given a synchronous C program, we can define its semantics as an STS. We take 
V as the set of system variables. As observables, we take an arbitrary subset 
O C V. The transformer function F is defined inductively for each statement, 
as follows: 

— For an assignment S = Vi '.= E, the transformer is given by 

Fs{V) = {Vi, . . . ,V^-i,E,Vi+i,. . . ,Vra). 

— For a conditional 5 = if C then else 5^, the transformer is given by 

Es = {ite{C, El El), ..., ite{C, F^, F^)), 
where ite{C,a,b) is an if-then-else expression, and F^ = (Fl . . . , F^), 
F'^ = {Ff, . . . , FD, are the transformers for , 5^, respectively. 

— For a concatenation S = Si, ... , ; Sk, the transformer is given by 

Es = Es^ o . . . o Fs^ . 



5 The Target Synchronous Languages 

As the target language we take machine code programs. 

5.1 Synchronous Machine Code Programs 

A Machine Code Program is composed of machine instructions. Each instruction 
has a label, which is a non-negative integer. We allow three types of instructions: 

— For a variable v and an expression Exp of compatible same type, “v := Exp'" 
is an assignment instruction. Exp can include calls to functions. 

— For a non-negative integer n, “branch n” is an unconditional branch in- 
struction. 

— For a non-negative integer n and a boolean expression C, “branch (C) n" 
is a conditional branch instruction with condition C and destination n. 

A machine code program P is a sequence of instructions P = Insto^..., Instn. A 
branch to a label whose value is £ > n, is interpreted as program termination, and 
is equivalent to a branch with destination n -I- 1. To guarantee that the program 
contains no loops, we require that the destination d of a branch instruction 
(whether conditional or unconditional) labeled I satisfies d > £, i.e. we allow 
only forward branching. 

5.2 The STS Semantics of Machine Code Programs 

We apply forward analysis to a synchronous program P = Insto,... , Instn, 
with a set of variables, V = {r>o, . . -Vk}, in order to produce its corresponding 
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state transformer. The transformer is computed incrementally. At each step, one 
instruction is considered, and a transformer is computed, based on the trans- 
former computed for the preceding instruction. We refer to this incremental 
computation as the Annotation algorithm. 

Let P = Instf), Insti , . . . , Instn be a program, where each InsU, f = 0, . . . ,n 
is a machine instruction. Assume that V = are the program vari- 

ables. The Annotation algorithm computes the annotations Ann[0], ...,Ann[n] 
and the transformers ..., T’”, inductively. For each instruction 

z = 0, ...,n, the annotation Anrz[z] expresses the condition under which instruc- 
tion Insti is executed, in terms of the input variables (the values of the variables 
before the program is started). The transformer expresses the state trans- 
formation effected by the program = Insto, . . . , Insti, where is the 

transformer for the entire program. The initial transformer has the trivial 
form 



F°{V) = (ui,... ,u„) 

The Annotation Algorithm - This algorithm scans the program from 
Insto to Instn, in order to compute F, the transformer of program P. 

1. Initially: 

Ann[0] = true, Ann[l] = . . . = Ann[n] = false, F° : XV.V. 

2. For z = 0, . . . ,n, examine insti, 

a) If insti is an assignment instruction i '■ Vj '■= E, 
then do: 

Ann[i + I] := Ann[i -I- I] V Ann[i] 

:= F^ o XVfvi,. . . ,Vj_i, ite{Ann[i],E,Vj) ,Vj+i,... ,Vm) 

b) If insti is a conditional branch instruction branch (C) j, 
then do: 

Anrz[z -I- 1] := Arzrz[z -I- 1] V Ann[i] A -■C'(F*(y)) 

Ann[j] := Ann[j] V Arzn[z] A C(F^(V)) 

pi+l pi 

c) If insti is an unconditional branch instruction branch j, 
then do: 

Ann[j] := Ann[j] V Ann[i] 

pi+l ■— pi 



Finally, F is taken to be 

6 The MCVT Tool 

MCVT, machine-code validation tool, was developed to validate the translation 
of synchronous programs. The source programs are C programs, produced au- 
tomatically from SCADE synchronous programs. The target program is binary 
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optimized code, produced by the commercial compiler DiabData of WindRiver, 
and intend to run on a PowerPC computer. MCVT accepts the source and tar- 
get programs and checks that the target is a correct translation of the source. 
If so, it prints ’’Valid”, otherwise it prints ’’Invalid”. The whole process is fully 
automatic, though it might be long. 

In this section we describe the general architecture of MCVT. For clarity, we 
use the symbol to represent the state transformer component that correspond 
to variable v in the STS under discussion. 

6.1 Compact Expression Representation 

Using a compressed state transformer function has the disadvantage of produc- 
ing huge expressions. For example, the transformer of the following machine 
code program: 



P : 

0 : ro := V -P 5; 

1 : rl := rO * 10; 

2 : V := rO -P rl ; 

is = A(ro, Ti, u) . (?; -P 5 , (w -P 5) • 10 , -P 5 -P (u -P 5) • 10). However, 
internally, each expression has only one copy and, a pointer to this copy is pro- 
duced whenever the expression is used. Thus, the state transformer of program 
P is represented by the compact tree that is shown in Fig. 1. 



-Cro 




Fr, 




! 

















( * ) 



A \ 



Fig. 1. Compact expressions tree 



6.2 The Formulas’ Simplifier 

The annotations that are produced for machine-code programs tend to be big, 
since each branch instruction adds a term. However, in many cases significant 
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simplifications are possible. A common pattern in the machine-code is the 
following one: 



lo : branch < cond > I2 ; 
li : assignment instructions... 
brcuich I3 ; 

I2 : assignment instructions. . . 
I3: 



Such a pattern is the result of an if —else statement in the source C code. Let 
Ann[lo] = true. Then, Ann[li] = -icond, Ann[l 2 ] = cond and Annlls] = cond V 
—•cond = true. In order to simplify the annotations, the condition expressions are 
handled symbolically. MCVT replaces every condition by a proposition symbol 
and simplifies the resulting formulas by resolution. The resolution is not used to 
validate the formula, but rather to extract an equivalent smaller one. A BDD 
simplifier could be used as well. 

DNF formulas - Let pi,p 2 , . . . be a set of propositions. The set £ of literals 
is 



C ::= p \ -<p \ true \ false. 



The C set of clauses is 



C ::= li A I 2 /\ . . ■ I true \ false, k € C. 



The set T of DNF - Disjunctive Normal Form formulas is 
T ::= Cl V C 2 , . . . | true \ false, Ci G C. 

DNF resolution [Rob65,BA01j - For clauses C,C and a literal I, let C = 
I AC be a clause, then C — I = C . An empty clause is equivalent to true. If 
a formula contains a true clause, the whole formula is equivalent to true. Let 
F = C V F be a DNF formula, then F — C = F' and F = F' + C. An empty 
formula is equivalent to false. The resolution is based on the observation that 
if 



Ci,C2 € F,l € Ci,^l € C2,Ci - I = C 2 






then 



F^F-Cx-C2 + {Ci-l). 



6.3 The Code Pattern Recognizer 

Special Machine Instructions. Most compilers use only a subset of the pro- 
cessor instruction set. Our implemented tool, as well as the formal semantics 
definition, consider only those instructions that are used by the compiler. We 
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devote special attention to a few instructions which are used in a special way by 
the compiler, and which require a special treatment. 

The Compare Instruction - The PowerPC processor has eight condi- 
tion registers (see [PPC02]), each consisting of a triple of bits. The instruc- 
tion compare i,< vail > < val2 > compares the values of < vail > < 
val2 > and assigns values to condition register number i. The first bit of the 
register is set if < vail > is greater than <val2 > , the second bit is set if 
< vail > is smaller than < val2 > , and the third bit is set if both values 
are equal. A bit in the condition register that was not set, is reset. In the state 
transformer system of a machine program, every bit of a condition register is 
represented by a boolean variable. The set of condition registers is represented 
by a two dimensional array of boolean type elements cr[i,j] (also denoted as 
CTij), where z = 0 . . . 7 and j = 0, 1, 2 which stand for eight triples. The value 
of each array element is a boolean expression. The compare instruction assigns 
value to one condition register, which consists of three elements of the array. For 
example, the transformer semantics for the instruction: compare 1,0, rO is: 

A(cri^o,cri4,cri,2) . ((ro < 0) , (ro > 0) , (ro = 0)) 

The machine code program in Fig. 2 demonstrates a use of the compare instruc- 
tion. The C source code that corresponds to this example is : vl := (v2+3=l). 
As the only observable variables are V\,V 2 , the corresponding observable part of 
the machine code program’s transformer is: 

X{vi,V 2 ) ■ {ite{{v 2 -I- 3 = 1), 1, 0), V 2 ) 

Bit Instructions - Other interesting instructions are those that manipulate 
bits. For example, the instruction rlwinm (Rotate Left Word i bits, then And 
with Mask) is used by the compiler to extract some bits from an integer, or a 
condition register. Its semantics is defined, using the functions: 

bit_array : array of booleans 1 — > integer 
extract-bits : integer x integer 1 — > integer 

However, when these functions are called in a special sequence with pre- 
defined values, their common semantics becomes a specific known arithmetic 
function. For example, the code in Fig. 3 computes the same value for vi as the 
code in Fig. 2. The corresponding component is: 

= extract {bit _arr ay {cr with{ [2] : {vs -I- 3 = 1))),2) 

which is recognized by MCVT as equivalent to: 

= zte((z;3 -I- 3 = 1), 1, 0) 

The optimizing compiler tries to choose the best machine-code sequence to 
perform a high-level language operation. In some cases, it is a sequence of bit 
manipulation instructions. These sequences are known patterns. We do not want 
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add 


r0,v2,3 add 3 to v2 and put the result in integer register rO 




compare 


crO,rO,l compare rO to 1 and, set the three bits of condition 






register crO 




branch 


crO.2 12 branch to 12 if bit no. 2 of crO is set 


11: 


move 


vl , 0 move 0 to vl 




branch 


13 


12: 


move 


vl , 1 move 1 to vl 


13: 







Fig. 2. Machine code example - vl := (v2 + 3 == 1) 



add rO , v2 , 3 add 3 to v2 and put the result in integer register rO 
compare crO,rO,l compare rO to 1 and, set the three bits of condition 
register crO 

11: move rl,cr move the content of the condition registers to 
integer register rl 

rlwinm vl,rl,2,l rotate the content of rl into vl, so that bit no. 2 
moves to the least signihcant bit and extracts it. 



Fig. 3. Machine code example - Optimized Machine Code 



to use the bit manipulation semantics because it complicates the produced state 
transformer. Therefore, MCVT recognizes these patterns, when they are used 
in the target machine-code and, replaces them with equivalent high-level opera- 
tions. 

Let the source C program contain the line: A := (1=0) In order to avoid 
branch instructions which are expensive in terms of performance, the compiler 
produces the following instruction sequence: 



ro : =cnlzw I ; Count the Number of Leading Zeros of I and 
assign to rg. 

A:=rlwnm ro,5, 1 Rotate rg by 5, then extract the first bit. 

The execution of these two instructions assigns A the value one if I equals zero 
and, zero otherwise. This instruction sequence is identified and replaced by its 
higher level STS semantics: Fa = ite{I = 0, 1, 0) . 



6.4 The Data Abstraction 

The variables of the source C program are local and global ones. The target pro- 
gram variables are the machine registers, as well as the memory variables, which 
are referred to as memory addresses. The target system observable variables, are 
decided by the compiler memory allocation scheme and the calling conventions. 
At this stage of the research we assume that only the memory is observable 
and thus, only memory variables are observable. Also, no consistency check is 
done, to validate that the calling conventions are correctly implemented in the 





242 



I. Gordin, R. Leviathan, and A. Pnueli 



target code. Another assumption relates to the variables’ type. We assume that 
all variables are of integer type and, that all integers are of the same size. 

Registers of the target system, as well as local variables of the source system, 
are not part of the system state transformer due to the use of the annotation 
algorithm, and state transformer composition. Thus, in order to construct the 
data abstraction, it is sufficient to map the memory variables of the source 
system, to their memory addresses. MCVT reads the debug information from 
the binary code and, uses it to produce the required mapping. Generally, the 
debug information of an optimized code is not necessarily correct. Yet, it is 
correct for memory variables at the entry and the exit points of the program. In 
any case, the soundness of the proof does not depend on the correctness of this 
information. That is, erroneous information could seldom lead to false positive 
conclusion. 

6.5 Decomposing the Verification Conditions 

Due to the use of STS semantics, and the observables mapping extracted from 
the symbolic information, it is possible to decompose the verification conditions, 
so as to have one verification condition for each observable variable. We demon- 
strate this by an example. 

Let the source transformer be 

: A(A, B) .{B+1) , ite{A > 5, 1, 2)) 



the target transformer 

: X{memo,mem 4 ) . {memi + 1 , ite{~'{memo < 5), 2, 1)) 

and the data abstraction mapping be the following: memo = A /\ mem^ = B , 
then the decomposition module produces the following two verification condi- 
tions: 



VCi memo = ^ A mem^ = B ^ mem^ -I- 1 = i? -I- 1 
V C 2 : memo = ^ A mem^ = B ^ ite{memo < 5, 2, 1) = ite {A > 5, 1, 2) 

6.6 Adaptation to CVC 

In certain cases CVC does not satisfy MCVT needs. One example is the type of 
variables. Simple types of CVC are either real or boolean. For real variables f,j, 
the formula i > j O f > j+l, is not valid. However, it is valid for integer variables 
i, j and, the compiler may use this equivalence to optimize the code. To handle 
this problem, MCVT adds the needed assertions (e.g. (z > j) = {i > j + 1)), 
whenever an equality of this kind is encountered. 

Another problem is that CVC does not support multiplication commutativ- 
ity, when both multiplication operands are variables. MCVT adds the needed 
assertions. 
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6.7 The Verifier Module (CVC) 

Within the scope of this research, we had to choose a validity checker that is 
capable of automatically validating the verification conditions. The verification 
conditions that MCVT produces are formulas, which include: first order logic 
operators (without quantifiers), Presburger Arithmetic, uninterpreted functions 
and array operations. The following tools were explored: 

1. (Stanford) - The Stanford Temporal Prover [BBC“''95]. 

2. Omega (University of Maryland) - A system for verifying Presburger for- 
mulas as well as a class of logical formulas and, for manipulating linear 
constraints over integer variables [KMP+]. 

3. CVT/C-|-tlv-|-tlvp (Weizmann) - A set of tools which provides a range min- 
imizing module, an environment that supports model checking and, a val- 
idator for Presburger arithmetic [PSS98c], [PS96]. 

4. ICS (SRI International) - Integrated Canonizer and Solver. (We checked the 
alpha version) [FORSOl]. 

5. CVC (Stanford) - Cooperative Validity Checker - Supports a combination 
of decision procedures for the validity of quantifier-free first-order formulas, 
with theories of extensional arrays, linear real arithmetic and uninterpreted 
functions [SBD02]. 

After a thorough examination of the above tools, CVC was found to best 
fit our needs, due to its reliability and the wide range of decision procedures it 
supports. It combines independent cooperating decision procedures for a wide set 
of theories, into a decision procedure for the combination of the theories, based 
on a version of the Nelson-Oppen [N079] framework. Appendix A demonstrates 
the application of MCVT on a small synchronous program and its translation. 
The C source program, the target machine-code, and one verification file (which 
is one verification condition ready for CVC) are listed there. 

7 Conclusions 

In this paper we have presented an approach to the validation of translations 
from synchronous C to synchronous machine code. We described its implemen- 
tation in the tool MCVT. Our goal is to use MCVT for the translation validation 
of real industrial application. Thus it has been used to validate the translation of 
a program developed for training, as part of the SafeAirl project. This program 
contains the code for a railroad crossing controller. It handles the passage of a 
train in a one-way railroad crossing. The system consists of a pair of reliable 
sensors that indicate train entering and exiting the crossing region, a signal for 
entering trains and a gate for blocking passage of cars from a side road. The 
verification took 200 seconds on a Linux workstation with i686 CPU. 

Currently, we are working on validating the translation of jet engine con- 
troller application from Hispano-Suiza. MCVT validates the translation done by 
the commercial compiler DiabData of WindRiver, which is the compiler used by 
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number of modules 


Number of Lines 


Validation Time 

[seconds] 


1 


266 


1221.82 


33 


144-155 


0.05-0.21 


1 


6414 


0.68 


1 


6287 


0.35 


1 


6286 


0.82 


1 


6286 


0.86 


1 


6285 


0.77 


1 


6286 


0.85 


1 


6286 


0.87 


1 


6336 


32.98 


1 


6336 


32.34 



Fig. 4. Run time of a jet engine translation validation 



Hispano-Suiza for the development of this system. Since the jet engine is a safety 
critical system, it is not allowed to use an optimizer, when compiling the soft- 
ware. As opposed to our expectation, the verification time is increased compared 
to the verification of the optimized translation. The reason for these surprising 
results is that the un-optimized assembly code is significantly longer, and con- 
tains many redundant machine instructions, which are otherwise eliminated by 
the optimizer. 

Fig. 4 summaries the results of this effort, (running MCVT on a Linux work- 
station with i686 CPU). For most of the nodes, translation was validated in 
fractions of minutes. When the size of the module increases, the verification 
time increases significantly. MCVT fails to validate the translation of some of 
the modules due to the limited C syntax it currently supports (e.g. case state- 
ments and floating point constants are not supported). Another limitation is 
that only a subset of the PowerPC machine code is supported. 

We plan to use a new updated version of CVC, CVC Lite [cvc] from Stanford 
University, which is faster and allows further simplification of the verification 
conditions. We also plan to extend the set of supported synchronous programs. 
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A Translation Validation of a C Program 



Fig. 5 shows a C program and its optimized translation. The machine code pro- 
duced, by the compiler, for _C_— > 73 < 200&&_C_— > 73 > 50, contains 
a single unsigned comparison. The machine code produced for the assignment 
_C'_— > 74 = _C_— > 74 == 0 is the sequence : cntlzw (count leading zeros), 
rlwinm (rotate with mask). MCVT recognizes the sequence and, treats it prop- 
erly, even when the two instructions are not adjacent. In Fig. 6 one of the input 
files to CVC is presented (without the variable declaration part). 



C Program 




Machine- Code Program 


typedef struct { 




ACS : Iwz 


rl2,0(r3) 


int 13; 




ACS+4 : Iwz 


r6,12(r3) 


int 14 ; 




ACS+8 : addi 


rl2,rl2,-51 


int outl; 




ACS+12: cmpli 


0,rl2,149 


int gc; } 




ACS+16: addi 


r5,r6, 123 


vars_rec ; 




ACS+20: addi 


r4,r6,567 


int ACS ( vars_rec *_C_) 


{ 


ACS+24: Iwz 


r6,4(r3) 


int cl; int c2; 




ACS+28: be 


4,0, ACS+48 


cl = 123+_C_->gc; 




ACS+32 : empi 


0,r6,0 


c2 = 567+_C_->gc; 




ACS+36 : be 


12, 2, ACS+48 


if (((_C_->I3) < 200)&& 


ACS+40: stw 


r5,8(r3) 


(50 < (_C_->I3)) M 


ACS+44: b 


ACS+52 


_C_->I4) 




ACS+48: stw 


r4,8(r3) 


{ _C_->outl = cl; 


} 


ACS+52: cntlzw r 12, r6 


else {_C_->outl = c2; 


} 


ACS+56: rlwinm rl2,rl2,27,5,31 


_C_- > I4=(_C_->I4==0) ; 




ACS+60: stw 


rl2,4(r3) 


return (1) ; 




ACS+64: addi 


r3,r0, 1 


} 




ACS+68: bclr 


20,0 



Fig. 5. G program and its translation to machine code 
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%Machine code conditions 

ASSERT pi = (((MEMORY0+-51)>=0) AND ((MEMORYO+-51)<149)); 
ASSERT p2 = (IF (pi) THEN 
(MEMORY4=0) ELSE 

((-1>=(MEMORYO+-51)) OR ((MEMORYO+-51)=149)) ENDIF) ; 

%C code conditions 

ASSERT pci = (((I3<200) AND (50<I3)) AND ( NOT 14=0)); 

Target state transformer for one variable 

ASSERT PMEMORY8 = (IF (( NOT pi OR (pi AND p2))) THEN 

(MEMORY12+567) ELSE 

(IF ((pi AND NOT p2)) THEN 

(MEMORY12+123) ELSE 

MEMORY8 ENDIF) ENDIF) ; 

%Data mapping 

ASSERT Poutl = PMEMORY8; 

ASSERT gc = MEMORY12; 

ASSERT 13 = MEMORYO; 

ASSERT Pgc = PMEMORY12; 

ASSERT 14 = MEMORY4; 

ASSERT PI3 = PMEMORYO; 

ASSERT PI4 = PMEMORY4; 

ASSERT outl = MEMORY8; 

%Integer assertions 

ASSERT (((MEMORYO+-51)>=0)=((MEMORYO+-51)>(0-1))); 

ASSERT (((MEMORYO+-51)<149)=(149>=((MEMORYO+-51)+1))); 
ASSERT ((-1>=(MEMORYO+-51))=(-1>((MEMORYO+-51)-1))); 

%Source state transformer 
QUERY Poutl = (IF (pci) THEN 
(123+gc) ELSE 

(567+gc) ENDIF) ; 



Fig. 6. One CVC input file - verification condition for one variable 
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Abstract. Symbolic representations and acceleration algorithms are 
emerging methods to extend model-checking to infinite state space 
systems. However until now, there is no general theory of acceleration, 
and designing acceleration algorithms for new data types is a complex 
task. On the other hand, protocols rarely manipulate new data types, 
rather new combinations of well-studied data types. For this reason, 
in this paper we focus on the automatic construction of symbolic 
representations and acceleration algorithms from existing ones. 

Keywords: reachability set, unbounded heterogeneous data, composi- 
tion of symbolic representations and acceleration methods. 



1 Introduction 



Context. Automatic verification of systems is a major issue considering the 
growing computerization of the society. At the same time, software based sys- 
tems become more and more complex to verify, because of 1) unbounded data 
types 2) parametric reasoning 3) heterogeneous data interacting between each 
other. Standard model-checking techniques, well-suited to the verification of sys- 
tems manipulating data ranging over finite domains, cannot handle these new 
challenges. 

Unbounded data and parametric reasoning. During the past decade, 
promising methods have emerged to extend model-checking to infinite systems: 
1) symbolic representations to represent infinite sets 2) widening or acceleration 
techniques to compute in one step infinite behaviors of the system. For sev- 
eral data types, symbolic representations with acceleration exist, tools are avail- 
able and encouraging case-studies have been conducted: integers [Boi98,BFLP03, 
LAS], reals [AAB00,BJW01,BHJ03], lossy [ABJ98] or perfect [BGWW97,BH99, 
FPS03] channels, etc. 

F. Wang (Ed.): ATVA 2004, LNCS 3299, pp. 248-262, 2004. 
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Heterogeneous data. On the other hand, very few work has been conducted 
on data heterogeneity. Of course, some approaches listed above consider hetero- 
geneous variables. However none of these approaches consider data heterogeneity 
in general, but only particular combinations of data types. At the same time real 
systems naturally use many data types. For example, to model a communication 
protocol, we need words for the channel content while the maximum number of 
reemissions or clocks to abort connection require numeric variables. None of the 
specific symbolic representations listed above can handle all these features, even 
if they are all available individually. So when considering a system combining 
data types for which symbolic representations and accelerations are well known, 
we still have to design ad hoc symbolic representation and acceleration. This is 
neither trivial nor satisfactory. An interesting approach would be to automate 
the construction of the symbolic representations and the acceleration algorithm 
from the existing ones. 

Related work. The Composite Symbolic Library [YKTBOl] makes an attempt, 
but it considers only computing symbolic representations sound for post opera- 
tion, not for acceleration which is much more difficult. Indeed in their specific 
case, the Cartesian product of representations is sufficient. Actually the library 
contains booleans (BDDs) and counters (convex sets or finite automata). TReX 
[ABSOl] combines accelerations on lossy channels and numeric variables. How- 
ever since they use a Cartesian product, they just obtain an upper approximation 
of the acceleration. Some approaches [Tav04] aim at making different tools work- 
ing together. It is a combination at a very high level (tool), not the symbolic 
representation level. In [BH99], acceleration of different FIFO queues is done 
using synchronization between iteration variables. Here we extend the method 
and give a general framework for any data types, symbolic representations and 
accelerations matching some hypothesis. Finally [BouOl] proposes a framework 
for the verification of infinite systems based on languages and rewriting systems. 
This framework recovers accelerations presented here for queues and stacks, as 
well as accelerations for parameterized networks of identical processes. However 
combination of algorithms is not explored. 

Our results. In this paper we explore the construction of new acceleration al- 
gorithms for heterogeneous data, from acceleration available on each data types. 
We propose a generic framework, based on the notions of weak heterogeneous 
systems, Presburger symbolic representations and counting accelerations to build 
automatically new symbolic representation and acceleration from existing ones. 

1. First we show that for some reasonable subclasses of heterogeneous sys- 
tems {weak heterogeneous), the Cartesian product of symbolic representations of 
each data type of the system is a good symbolic representation for the system. 
However the Cartesian product of accelerations is just an upper approximation. 

2. Then we propose a subclasses of symbolic representations {Presburger 
symbolic representation) and acceleration {counting acceleration) for which the 
construction of a symbolic representation and an acceleration algorithm of the 
Cartesian product can be automated. We then review existing symbolic rep- 
resentations and accelerations, showing our framework includes many existing 
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work. Presburger symbolic representations and counting accelerations already 
exist for unbounded counters, reals/clocks, perfect FIFO queues and stacks. 

3. Finally we discuss the architecture of a model-checker for infinite systems 
with arbitrary heterogeneous data. For any arbitrary weak heterogeneous system, 
data types are identified and specific symbolic representation and acceleration 
are automatically built from the ones available in a library. Then a generic 
semi-algorithm is used to compute the reachability set of the system. Checking 
safety properties is straightforward. So the user has only to provide symbolic 
representations and accelerations for data types not available in the library. 
All the rest is automatically done. This approach does not pretend to be more 
efficient than dedicated symbolic representations and algorithm, but it allows to 
reuse a max;imum of work. 

Outline. The rest of the paper is structured as follows. Section 2 lists some 
existing symbolic representations, used in the rest of the paper to illustrate the 
definitions. Section 3 provides basic definitions of heterogeneous systems and 
symbolic representations. Section 4 defines interesting subclasses of symbolic 
representations and accelerations techniques, from which acceleration on new 
data types can be automatically computed. Section 5 shows how these results 
can be used to design a tool for the verification of heterogeneous systems with 
arbitrary (unbounded) data types. 

2 Existing Symbolic Representations 

2.1 Notations 

N, Z, K are the sets of non negative integers, integers and reals. For a set S, V{S) 
denotes the powerset of S, while Vf{S) denotes the set of finite subsets of S. 

Let ri,r2 be two relations over Pi x Qi and P 2 x Q2- We define ri x r2 C 
(PixP 2 )x(( 5 ixQ 2 ) asrixr2 = {((pi,P2), (gi, <72)), (pi, <?i) G riA(p2,g2) G ^2}. 
The image of r\, written Im{ri), is Q\. 

Let 27 be a finite alphabet. A linear regular expression (LRE) on 27 is a 
regular expression of the form u\v\U 2 V 2 ■ ■ ■ ufvn with Ui,Vi G 27*. A semi-linear 
regular expression (SLRE) [FPS03] on 27 is a finite union of LREs. A simple 
regular expression (SRE) [ABJ98] over 27 is a sum of products of (oj -I- e) and 
(oi -I- ... -I- On)* with Oi G 27*. 



2.2 Presburger Arithmetics 

Presburger arithmetics is the first order theory of N*”, with predicates < and -I-, 
denoted < >. Presburger arithmetics is decidable, and sets over N™ 

defined by a Presburger formula can be represented and manipulated by finite 
automata [BC96] . These constructions can be extended to Z and K (using weak 
deterministic Biichi automata for the real case). Actually weak deterministic 
Biichi automata represents more than Presburger arithmetics. They can encode 
the theory < K, <,-|-,Z > [BJWOl], where Z is a predicate stating if a real is 
an integer or not. In the following sections, we will write Presburger arithmetics 
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on K for < K, <,+,Z >. Identifying a Presburger formula with the set of its 
solutions, we write x € ip instead of x |= 



2.3 Symbolic Representations 

We present here briefly several symbolic representations. This list does not at- 
tempt to be exhaustive. It will be used throughout the paper to show our general 
framework captures many existing works. 

Boolean. BDDs [Bry92] is a compact representation for (large) sets of boolean 
values, using deterministic acyclic graphs. 

Numeric variables. Convex polyhedra [HPR97] can be used to represent sets 
of reals, but the representation is not closed by union, and usually upper ap- 
proximation are made (convex hull). NDDs [WB00]/UBAs [Ler03] is a symbolic 
representation of integers by automata. RVAs [BJWOl] uses a similar encod- 
ing for reals. DBMs [Dil89] is a symbolic representation for clocks. CPDBM 
[AABOO] is an extension of DBM to handle unbounded integers. Basically the 
DBM is extended with parameters and (possibly non-linear) constraints. 

Perfect FIFO Queues /stacks. QDDs [BGWW97] represents the content of a per- 
fect FIFO queue by a regular expressions on the (finite) alphabet of messages. 
SLRE [FPS03] is based on same ideas, but they use SLRE instead of regular 
expressions. CQDDs [BH99] can be seen as a SLRE with variables attached to 
each and a Presburger formula linking the variables. It is straightforward to 
use these representations for stacks [BouOI]. 

Lossy FIFO Queues. [ABJ98] uses Simple Regular Expressions (SRE) to repre- 
sent the content of lossy FIFO channels. 

3 Combination of Symbolic Representations 

In this section, we handle the following issue: given a system with heterogeneous 
data types, and assuming we have symbolic representations for each data type, 
can we obtain easily a symbolic representation for the whole system? The answer 
is yes. However the method does not work anymore for acceleration. 



3.1 Preliminaries 

A transition system is a pair {D, — >■) where D is a set (the domain) and DxD 
is the transition relation. In general, the relation — >■ is not recursive. However 
in the following sections we will always suppose that — >■ is finitely presented by 
a finite set of recursive relations over D x D. Formally we suppose that there 
exist TO > 0 recursive relations n C D x D such that — >■= (ri,... Let 

TZ = {ri, . . . ,rjn}. A finitely presented transition system, shortly a system, will 
be written {D,TZ) instead of (£>,—>•). 
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Remark 1. Any transition system represented by a control graph labeled by ac- 
tions on a finite set of variables follows this definition. TZ is then the set of 
transitions. In this paper we are mainly interested in systems manipulating het- 
erogeneous variables, i.e. systems whose domain D can be written as a Cartesian 
product of elementary domains Di. In the following sections, x . . . x Dt is 
written XD^\ 

Definition 1 (symbolic representation). A symbolic representation for a 
system H — {D,TZ) is a 5-tuple S = (S, 7 , U, C, post) where S is the set of 
symbolic states, the concretization function 7 : S — !■ V{D) maps to each s G S 
a set of concrete states in D, U and C are union and inclusion on S and post : 
TZ X S ^ S computes the successors of a symbolic state, satisfying: 

1. 7 (si US 2 ) = 7 (si) U 7 (s 2 ) (consistency of union); 

2. 7 (post(r, s)) = r( 7 (s)) (consistency of post); 

3. Si C S 2 7(si) C 7 (s 2 ) (consistency of inclusion) . 

A symbolic representation is effective if U and post are computable and C is 
decidable. 

Some examples. For example, UBAs, NDDs and RVAs are effective symbolic 
representations for N, Z, and K with Presburger relations. QDDs , SLRE and 
CQDDs are effective symbolic representations for perfect FIFO queues or stacks. 
SRE is an effective symbolic representation for lossy FIFO channels. DBMs is an 
effective symbolic representation for clocks and bounded counters, CPDBM is a 
symbolic representation for clocks and counters, but it is not effective because 
basic operations require to solve non-linear arithmetics formula. 

Definition 2 (Reachability set problem). Given a system TL = {D,TZ), let 
S = (S, 7 , U, C, post) be a symbolic representation for H and an initial sym- 
bolic state So G S, the reachability set problem is to compute s' G S such that 

7^*(7(so))=7(s')• 

The reachability set problem is often undecidable. However semi-algorithms 
has been developed to solve it in practical cases. Acceleration allows to compute 
in one step the transitive closure of a relation. It differs from widening [CC77] in 
that widening aims at termination and loses exactness. Given a relation r and a 
set D, the acceleration of r applied to D is defined by r*{D) = 

Definition 3 (acceleration function). Consider a system TL = {D,TZ) and 
S = (S, 7 , U, C, post) an effective symbolic representation for TL. An acceleration 
function for (H,S) is a computable totally defined function post* : TZ* x S — >■ S 
such that Vr,s G 77.* x S, 7 (post*(r, s)) = r*( 7 (s)). 

Until now, there is no general theory of acceleration, and existing algorithms 
dramatically rely on the structure of the symbolic representation. There exist 
acceleration for SRE [ABJ98], QDDs [BGWW97] and SLRE [FPS03] with re- 
stricted operations on queues, GQDDs for queues [BH99] and stacks [BouOl], 
NDDs [Boi98] and UBAs [FL02] with restricted affine functions, RVAs [BHJ03] 
and restricted hybrid transitions. The acceleration defined on GPDBMs [AABOO] 
is not effective since it implies solving non-linear arithmetics formula. 
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We define here the class of systems, namely flat systems, for which having 
an acceleration function is sufficient to solve the reachability problem. Consider 
TZ = {ri, . . . , r„} a set of relations and 1Z = {ri, . . . ,i\i} its associated set of 
letters. We write RegExp{1Z) the set of regular expressions over 1Z. We define 
inductively E : RegExp{1Z) — >■ TZ* by if'(0) = 0, ^{s) is the identity relation, 
E{fi ■ w) = n - E{w), E{wi + W 2 ) = tf'(tci) U W{w 2 ) and ^{w*) = E{w)* . 

Definition 4 (flatness). Consider a system TL = {D,TZ) with TZ = 
{ri, . . . ,r„}. H is flat if there exists L G RegExp{'JZ) such that L is a SLRE 
and TZ* =1'{L). 

Theorem 1. Let H = (D,TZ) a flat system, S = (S, 7 , U, □, post) a symbolic 
representation for H and assume there exists an acceleration function for 
Then for any Sq G S, 7^*(so) is computable. 

Sketch of the proof. 

Since Tt = {D, TZ) is flat, there exists L a SLRE on 1Z such that TZ* = T{L) = L. 
Then computing TZ*{sq) comes down to computing L(sq). Since L = E{L) and 
Z is a SLRE over TZ then L is of the form (JjUi, where each Ui is of the form 
Ui = u*, Vi, ... u* Vi„ , where all the Ui, , Vi, are relations in TZ. 

i ' 3 ‘ 3 

By hypothesis, S is closed by post for any function of TZ and post* for any 
function of TZ* , and these are computable. Thus each Ui(so) = u*^ ...m*^(so) 
exists and is computable. Then since S is closed by U, there exists s' G S such 
that s' = UiSi = {E{L)){sq) = 7^*(so) and s' is computable. □ 



3.2 Weak Heterogeneous Systems 

Definition 5 (weak heterogeneous systems). LetTL = {xD'lfTZ) be a sys- 
tem. TL is weakly heterogeneous if there exist n TZt finite sets of relations over 
X such that TZ C xTZi. We write TL = {XD^\TZ, XTZf). 

Intuitively in a weak heterogeneous system, data types are strongly encapsu- 
lated. Each data has its own operations, and the whole system is built combining 
these operations. Since this is consistent with modular or object oriented design, 
weak heterogeneous systems are well-spread in practice. 

For example, communication protocols using (possibly lossy) channels with 
finite sets of messages, (parameterized) maximum number of reemissions and 
clocks for abortion are weak heterogeneous systems. This includes many well- 
known protocols like the Alternating Bit Protocol, the Bounded Retransmission 
Protocol, the sliding window protocols, etc. Typical operations are of the form 
(counter < MaxRetrans) , (clock<MaxDelay) — ^ (Imess), (counter++) 
where counters, clocks and the channel are used with restricted interaction. 

On the other hand, a weak heterogeneous system with queues and counters 
cannot model writing the value of a counter into a queue, since it implies mixing 
the structures of the data types, and not only their operations. 

In the following, we want to derive algorithms or properties on the whole 
system TL from the study of projected systems {D^fTZi). 




254 



S. Bardin and A. Finkel 



Theorem 2 . Let % = XT^i) he a weak heterogeneous system. As- 

sume that for all i < n, there exists (Si, 7i, Uj, Cj, post^) an effective symbolic 
representation for {D^,TZi). Then {'Pf(XSi), X7i,U, X Ci, Xposti) is an effec- 
tive symbolic representation for TL, where U is the union operator on Vf^xSi). 

Proof. We denote G Si, v = (si, . . . ,s„) G xSi, p = {wi , ... ,Vk} G 'Pf{xS^). 
With U, the consistency of union (1) is evident. (2) consistency o/post. Let r = 
Xri G TZ. Then r{'^{v)) = r{X^i{si)). Since TL is weak heterogeneous, it follows 
that r{'^{v)) = Xri(7i(si)). Each Si is a symbolic representation for TLi. Then 
r("f(v)) = X7i(posti(ri,Si)) which is equal to 7(Xposti(r, s)). That proves the 
property for XSi. The extension to 'Pf(XSi) is straightforward. (3) consistency 
of inclusion. Assume that vX Qiv' . Then by definition Vt,Si Qi s'. Each Si is a 
symbolic representation for TLi, then Vz, 7i(si) Ci 7i(s'). Thus X"fi{v) C X"fi{v'). 
The extension to Vf{X Si) is defined by pX Ci p' ^ Vz, 3j, Vi X Cj Vj. □ 

We just proved that, given arbitrary symbolic representation for certain do- 
mains and relations, their Cartesian product is a symbolic representation for the 
associated weak heterogeneous systems. A natural question is: does this hold 
also for acceleration? The answer is no. Indeed in general, the Cartesian prod- 
uct of accelerations is an upper approximation. For example TReX uses such 
approximation to combine counters/clocks with lossy FIFO channels. 

Theorem 3 . Let TL = {xD^flZ, XTZi) be a weak heterogeneous system. As- 
sume that for all i < n, there exists Si a symbolic representation for TLi = 
,'R.i) and an acceleration function post*j for (TLi, Si). Then Xpost*j is an 
upper approximation o/post* on (TL, XSf). 

Sketch of the proof. 

Let r = ri X T2 and s = (si,S2). Intuitively, post*;^(ri, Si) x post*2(r2, S2) repre- 
sents UzeNUjGN'’i(si) X ’’2(52) while post*(r,s) = UfceN^i(si) x r^(s2) □ 

4 Combination of Accelerations 

We define in this section a particular class of symbolic representations, and a 
particular class of acceleration functions such that if there exist such symbolic 
representation and acceleration functions for domains Di , . . . , Dn we can au- 
tomatically build a symbolic representation and an acceleration function for a 
weak heterogeneous system on Z?i, . . . , £>„. These classes recover many existing 
symbolic representations and accelerations. 



4.1 General Framework 

The main idea is to use symbolic representations which combine some finite 
part (a word in our definition) with a counting part, represented as a formula 
added to the finite part. Then for the acceleration, the counting parts are syn- 
chronized together. We consider constraints expressed in Presburger formula. 
Presburger must not be far to be a maximal logic for our construction, since 
in the proofs, the constraints are required to be closed under intersection and 
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existential quantification, and satisfiability must be decidable. However we do 
not have investigated this question yet. For example it is easy to extend our 
construction to the whole logic defined by binary automata. 

Definition 6 (Presburger symbolic representation (PSR)). 

Let H = {D,TZ) he a system. A Presburger symbolic representation for H is 
an effective symbolic representation Sp = (P/(Sp), 7, U, C, post) such that: 

1. Sp is defined by a 4~tuple {S, L,Af, V) where L is a language over an alphabet 
S, Af is a countable set of variables, V : C — >■ Vf{Af) ; representing the set 
of 2-tuples sp = {w,<P{V{w))) with: 

— w is a word in L , 

— <L(y{w)) is a Presburger formula whose free variables are in V(ru) ; we 
will write L>{w) or even for convenience 

2. the concretization function 7 is defined as follows: 

— there exists a function 'ja '■ {w : L) x — >■ D such that Im{'-fa H) 

is not reduced to a singleton, 

— 7((w,<?)) = U«g,|,7a(w,v) 

3. post(r, (w, ^(ru))) = {w' ,3w,<P{w) A (p(w,w')) where w' and (p depend only 
on r and w. post is then extended to P/(Sp) using the U operator. 

4 . {w, <P{w)) C {w', <P'{w')) w = w' A<L => <P' 

Requirement 2 means that the concretization function really takes into ac- 
count the counting part of the representation. Requirement 3 means that a post 
operation modifies the word w and the formula <P according to w, adding to 
a formula tp linking old variables of w and new variables of w' . Many existing 
symbolic representations are Presburger symbolic representations. 

Remark 2. These properties hold for Presburger symbolic representations: 

— 7 ((w, False)) = 0 ; 

- 7((w,^i V ^ 2 )) = 7((w,^i)) U7 ((w,^2)) ; 

- 7a((w, {vi,... , Vn))) = 7((w, A* Wi = Vi)) ; 

— if post(r, (w, d>i)) = (w', A p) then post(r, (w, ^ 2 )) = (w', 3wd>2 A p). 

Theorem 4. NDDs, UBAs, RVAs and CQDDs are Presburger symbolic repre- 
sentations. 

Sketch of the proof. 

For NDDs/UBAs/RVAs over 2 variables x,y take S = {elj-C = {s},Af = 
{x,y},V : e — f {x,y}. For CQDDs over a finite set of messages Smsg, take 
S = Smsg U {*}, C the language of SLRE over Umsg (which are words over 
S) and V ar a, function which associates to each * in a word in C a distinct 
variable. Notice that in this case, we use a restricted inclusion on CQDDs, only 
semi-exact. □ 

HDD, QDD, SLRE, SRE are not Presburger symbolic representations, since 
there is no counting in these representations. CPDBMs are not PSR too since 
the formula used are out of Presburger arithmetics. 
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Definition 7 (Counting acceleration). Let TL = (D,TZ) be a system 
and Sp = (P/(Sp), 7, U, □, post) a Presburger symbolic representation for 
PL. A counting acceleration for (fPL,Sp) is an acceleration function post* for 
(PL,Vf{Sp)) such that Vsp = (w,<P{w)) G Sp,Vr G PI, 

1. post*(r, (w, <!>)) = {wl,39 G N,3w,<P{w) A(pi{w,w'^,9)) where {k,w[,(pi) 

depends only of r and w ; 

2. 'y{{w' ,39 G N,3w,<?(w) A (p{w,w',9) A9 = i)) = 

post* is then extended to P/(Sp) using the U operator. 

Intuitively a counting acceleration keeps precisely the number of iterations, 
so that it is possible to manipulate it. Requirement 1 means that the post* 
operation introduces an iteration variable and requirement 2 ensures that this 
variable keeps precisely the number of iterations. 

Remarks. If post*(r, (m, ^i)) = A (pi{w,w[,9)) then 

post*(r, {w,P’ 2 )) = U,<ki'^',,39 G n,3w,P’2 A (p,(w,wl,9)). 

Theorem 5. Counting accelerations exist for NDDs/UBAs, RVAs, and 
CQDDs. 

Sketch of the proof. 

We cannot described in detail the algorithms used. See theorem 8.52 of [Boi98], 
proposition 2 of [FL02], theorem 5 of [BHJ03] and theorem 5.1 of [BH99]. □ 

We define the synchronized product of PSRs. In this representation, the two 
PSRs are synchronized on shared variables. The consequence is that given two 
PSRs, the set of concrete states expressible with the cartesian product is strictly 
included in the one expressible with the synchronized product. This allows us to 
define an (exact) acceleration function for the synchronized product of PSRs. 

Definitions (Synchronized product of PSR). Let Spi,Sp 2 two Pres- 
burger symbolic representations Spi = (P/(Spj), 7i, Uj, Cj, post,) with Spj = 
{Si, Li,Mi,Vi) . The synchronized product Spi® Sp 2 is defined by (P/(Sp), 7, U, 
C, post) where: 

— Sp = {S\ X S 2 , X Li 2 ,Mi X M 2 , Vi X V2), 

— 7(wi,W 2,^) = U(t,i,t,2e<f)3'l,o(wi,Ui) X 72,a(w2,U2) ; 

— U zs the finite union on P/(Sp) ; 

— pOSt{wi,W 2 ,'P) = {w[,W2,3wi,W2'T{wi,W2) Api{wi,w'.^) Aip 2 {w 2 ,w'. 2 )) i 

— {wi ,W 2 ,P’) E {Wl ,W 2 ,P>') ^ Wi = w'l A W 2 = w '2 A ^ <l>' . 

Theorem 6. Let PL = {xO'ffPZ, XPZi) a weak heterogeneous system. Assume 
that 'Pi, there exists Spi = (P/(Spj), 7j, Uj, Ei, post,) a Presburger symbolic rep- 
resentation for PLi = {D'f* ,PZi) and post*j a counting acceleration for {PLi,Spi). 
Then 

1. ^Spi is a Presburger symbolic representation for PL, 

2. there exists a counting acceleration for {PL,^Spi). 
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Proof. 

We prove statement 1 that Spi is a PSR for PL. For the sake of conciseness, 
we give the proof for only 2 PSR. The extension to n PSR is clear. Requirement 
1 of a symbolic representation {consistency of union) is clear with U. 

We prove requirement 2 of a symbolic representation, the consistency of 
inclusion. Remark that if ^ then {v|<?(v)} C {v'\<P' {v')}. Using this and 

the definition of 7^ in terms of it is then straightforward that {wi,W 2 ,P’) E 
{w[,W2,P'') j{{wi,W2,<P)) C -f{{w[,W2,P’')). 

Now we prove requirement 3 of a symbolic representation, the consistency 
of post. Let us show that r{'j{wi,W 2 , ^)) = 7 (post(r, {wi,W 2 ,P’))). 

By definition of 7, r{j{wi,W2,<P)) = 7i.a(wi, ui) x 72.a(w2, U2)) 

= l 2 ,a{w 2 ,V 2 )). Since H is a, weak heteroge- 
neous system, r{j{wi,W2,<P)) = ui)) x r2(72,a(w2, -C2)), 

which can be written r(7(wi, W2, ^)) = A ^<^17 = 

X ?'2(72(w2, A ^27 = ■^27)) using remark 2. Then r{'y{wi,W2,P’)) = 
U(t,i,«2e<z>)7i(posti(ri,(wi,/\u;iy = viy))) x 72(post2(r2, (w2, A = -C27))) 
because Si are symbolic representations for Hi. By definition of post, this is 
equal to li{wi,^wi Awi,i = vi^iA(pi{wi,w[)) x 72((w2, 3^2 A ^^27 = 

V2,i A Lp2{w2,W2)))- Using definitions of 71 and 72 we get that 
r{-f{wi,W2,<P)) 

= ) 7 l,a('Ul'j, «!)) X (Uu^e 3 u .2 Am 2 =t >2 A 9^2 (u >2 ,"2 > (™2 > E )) 

LJ(d^ ,-U2 ,'U)2^C^l ,'*^^2) A(/?i (-u!! (^2 ,^2)) ^ T 2 ,a('^ 27 ^ 2 ) 

= 'y{w[,W 2 ,^Wi,W 2 ^{wi,W 2 ) A (pi{wi,w[) A IP 2 {W 2 ,W 2 )))) 

Then r(7(wi, W2, ^)) = 7(post(r, (ici, W2, ^))). This proves requirement 3 , 
and the first statement, that is a PSR for H. 

We now prove statement 2, that there exists a counting acceleration for 
{H, 0 Spi). We assume for conciseness that for each counting acceleration post*j 
on {Hi,Spi), the image by post*j of each sp G Spj is a singleton. The extension 
to /m(post*J C 'Pf{Spi) is straightforward but cumbersome. 

We define the acceleration post* on {H,Vf{^Spi)) the following. As- 
sume that the post*j are defined by Vi, post*^(ri, (w^, <?i)) = {w[, 3 wi, 9 ,<Pi A 
(fi{wi,w{,6)). Then we define the acceleration post* on the whole system H by 

post* (ri X r2 , (rii , R’2 1 ^('^1 » '^2))) = ('^1 ^ '^2 » , iti2 , 9 <P{wi , W2) A w'.^, B) A ip2 {'^2 1 '^'2 ’ ^))- 

First we show that H{j{wi,W 2 , ’P)) = 7(wi, ^wx,W2t 9<1> A(piA(p2A9 = i) (*). 

H{i{wi.W2,<^)) 

= ■^AU«i,t,2G<i>7ia(ici,i;i) X 72 a(w 2 ,U 2 )) by definition of 7 
= U«i,„2G<i>^A7la(wi,Z;i) X 72a(w2,U2)) 

= U«i,«2G<i>^i(7ia(wi,ui)) X r^( 72 a(iC 2 ,U 2 )) since H is weak heterogeneous 
= U«i,«2G<i>^i(7i(w^i>''"i = '*^ 1 )) X rl{j{w 2 ,W 2 = V 2 )) see remark 2 
= U«i.t,2e<s7i(w^'i,3u;i,6» wi = ui A (fi{wi,w[,9) A 9 = i)) x 72(^2, 3^2, 6» W 2 = 
V 2 A<P 2 {w 2 , W2,9)A9 = i)) see remark 3 and the post*^ are counting accelerations) 
= U«j,„'G3™i.™2.e<gAviAv.2Ae=i7iK, A) X l 2 {w' 2 ,v' 2 ) Same idea than for post 
= 'y{w{,W2, 3wi,W2, 9<P a ifi a P2 a 9 = i). 
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Now let us prove that 7(post*(r, {wi,W2,^))) = (**)• 

r* (7(wi, W2,^)) 

= U r^{l{wi,W2,<P)) 

= \J^j{w[,W 2 ,^wi,W 2 , 0 <P A (fii A (fi2 A 0 = i)) from (*) above 
= UU„j,.„',G3^i.^2.e<i.A<piAv,2Ae=i)7i.aK,<) X 72 ,aK.i^ 2 z) definition of 7 
= Uv[,v'^(^3wuW2,9‘1>^Vi^V2 X 72.aK,W2) 

= 'y{w[,W2, 3 wi,W 2, 6<P Aipi a ip2) 

= 7(post*(r, (wi, W2,^))) 

(**) proves we have an acceleration and (*) proves also this is a counting accel- 
eration. □ 



4.2 Existing Presburger Symbolic Representations 

Results on existing counting accelerations are summed up, to give the reader an 
idea of systems we can manipulate. 





data 


effective symbolic 
representation 


post*/ counting 


SRE 


lossy queues 


yes 


yes/no 


QDD 


queues/stacks 


yes 


restricted/no 


SLRE 


queues/stacks 


yes 


restricted/no 


BDD 


boolean 


yes 


no 


CQDD 


queues / stacks 


yes 


yes 


UBA/NDD 


counters 


yes 


yes 


RVA 


clocks and counters 


yes 


yes 


union of convexes 


clocks and counters 


yes 


no 


DBMS 


clocks 


yes 


no 


CPDBMs 


clocks and counters 


IZ semi-decidable 


semi-decidable 



Theorem 7. Given the previous results, a symbolic representation and an accel- 
eration function can he automatically computed for weak heterogeneous systems 
manipulating counters, clocks, perfect FIFO queues and stacks. □ 



4.3 About Termination 

We can automatically combine acceleration functions. Now the issue is the in- 
surance of termination. Of course it is undecidable in the generic case. But there 
are interesting particular cases. For example, if each projection of the system on 
each different data types is flat, i.e. is computable using an acceleration function, 
is the whole system flat or not (*)? We did not investigate this question deeply. 
However we sketch here briefly a first result of flatness for a whole system, from 
the studies of its projections. 
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We define the projection of a system H on a, data type Di as the system 
jj' obtained from "H projecting all the relations of "H on Di, and removing the 
identity relation. We introduce controlled systems, i.e. systems with a control 
graph. These are particular cases of systems, seeing their domain as a Cartesian 
product of arbitrary sets and a finite set of control states. Usual systems are 
controlled systems. A controlled system is structurally flat when its control graph 
is flat (no nested loop) . Structural flatness implies flatness. It is straightforward 
that if each projection of a weak heterogeneous system is structurally flat, then 
the whole system is flat (while not necessarily structurally flat). 

This is a weak version of (*), and we plan to work more on this aspect. 



5 Toward a Tool for Generic Weak Heterogeneous 
Systems 

The previous results can be used to design a software for the verification of 
arbitrary weak heterogeneous systems. The idea is to have a generic algorithm 
working on generic PSR, and a library of PSR for different data. Then ad hoc 
symbolic representations are automatically computed using theorem 6. 

5.1 The Generic Symbolic Representation 

We consider a generic Presburger symbolic representation (GPSR), which must 
provides the U, C, post and post* operators. Considering theorem 6, implement- 
ing the composition operator GPSR®GPSR — >■ GPSR is quite straightforward. 
Then given a weak heterogeneous system R with domains Di , . . . , Dn and re- 
lations in TZi X ... X TZn, given a library of Presburger symbolic representation 
PSRi (implementing GPSR) available for each (Di,TZi), we can automatically 
compute the class 0^ PSRi implementing GPSR for R. 

5.2 The Generic Algorithm 

Basically, tools implementing effective accelerations use similar semi-algorithm 
to compute the reachability set. They use the classic iterative fixpoint compu- 
tation, firing transitions one by one. The difference is to add meta-transitions 
to the set of transitions, and fire them with the post* operation. The problem 
comes down to finding these meta-transitions. They may be provided by the 
user [LAS], or automatically found. In this case they can be computed statically 
[BFL04] or dynamically [ABSOlj. The first approach is easy to extend to any 
data type, while the second needs a notion of “progress” difficult to define. 

5.3 Architecture 

We give the architecture (Fig. 1) of a tool operating on weak heterogeneous sys- 
tems with arbitrary data. The tool takes a weak heterogeneous system in input. 
Specific symbolic representation and an acceleration technique are computed 
from symbolic representations available in the library, matching the data types 
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of the system. Then the generic semi-algorithm for reachability set computation 
is instantiated with this specific symbolic representation. When the user wants 
to use unknown data, he has to provide only the PSR and the counting acceler- 
ation, instead of writing a new tool. And if the system uses any combination of 
data types available in the library, nothing is to be done. 




Fig. 1. Architecture of a tool for verification of arbitrary weak heterogeneous systems 



6 Conclusion and Perspective 

We focus here on the combination of existing symbolic representations and ac- 
celerations. We defined subclasses for which this combination can be automated. 
These subclasses turn out to be well-spread in practice and capture many exist- 
ing work on numeric data, queues and stacks. Finally we have shown how these 
theoretical results can be used to design a tool for the verification of infinite 
systems with arbitrary data types. Future works include 1) theoretical work on 
getting more efficient combinations, heuristics and termination results on the 
whole system from the projections on each data types, as well as implementa- 
tion of a prototype and case-studies; 2) defining which logics are suitable for 
our construction and isolating some maximal logics; 3) relaxing the hypothesis 
of exactness to combine a larger class of symbolic representations, in a more 
precise way than with the Cartesian product. 
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Abstract. Though model checking itself is a fully automated process, verifying 
correctness of a hybrid system design using model checking is not. This paper 
describes the necessary steps, and choices to be made, to go from an informal 
description of the problem to the final verification result for a formal model and 
requirement. It uses an automotive control system for illustration. 



1 Introduction 

Hybrid systems are characterized by a non-trivial interaction between discrete and con- 
tinuous subsystems. A typical setting is a digital controller in an analog environment. 
This interaction makes formal verification of hybrid systems not just tedious, but intrin- 
sically difficult. In recent years the field of hybrid systems has seen significant advances. 
The hybrid automaton model has been widely adopted as standard for describing hybrid 
systems [1], and model checking has been shown to be decidable for important classes 
of hybrid systems and undecidable in general [11,17]. This research has resulted in a 
number of tools for model checking of hybrid systems such as Hytech [9], Verishift [16], 
d/dt [5] and CheckMate [3]. 

These tools and other techniques have been applied to a number of case studies in 
the domain of automotive control, robotics, avionics or process control. Examples can 
be found in the proceedings of the Workshop Hybrid Systems: Computation and Control 
(HSCC) [2] and in the proceedings of its predecessors. Despite successful applications 
of verification tools, it has been questioned if these techniques scale to real life prob- 
lems, i.e. problems with a complexity that can be encountered in industry. The DARPA 
project Model Based Integration of Embedded Software (MoBIES) includes Open Ex- 
perimental Platforms (OEPs) to assess the limits of current technology for hybrid systems 
verification. This paper considers the Electronic Throttle Control (ETC) problem of the 
automotive OEP Given a Simulink/Stateflow model and an informal description of sys- 
tem and requirements, the task is to take the model, translate it if necessary, and to verify 
the system requirements. 

This paper uses the ETC case study to illustrate the process that leads from the 
informal specification to verification. In [13] Henzinger et al. review several case studies 
performed with the tool HyTech. It presents criteria to decide when model checking 
with HyTech is promising, discusses shortcomings of HyTech, and future directions for 
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tool development. The authors conclude, for example, that the number of continuous 
variables is a limiting factor. In [19] Rushby describes the use of verification in the 
design process of critical systems, provides an introduction to different formal methods 
and their main concepts, and identihes steps in the design process where formal methods 
could contribute to the quality of the design. In contrast with this related work we assume 
the known limitations of hybrid verification, and assume that the decision to use formal 
verification has been made. 

2 From Simulation to Verification 

The ideal situation for hybrid verihcation would be to start from a formal model that 
either has the desired input format or can be translated automatically into it. In addition, 
the complexity of the verihcation model should be within the capabilities of the model 
checker. The property to be checked should be given as a proper temporal logic formula, 
or whatever formalism required. Given these preconditions verihcation would indeed be 
a push button technology. 

It is, however, more likely that the starting point for formal verihcation is an informal 
system description. The description may be accompanied by a simulation model or im- 
plemented code. This information is then used to build the verihcation model manually. 
A hrst step in this process is to understand the mathematical relationships that govern 
the system. Examining the simulation model and studying several simulation runs can 
be very useful in this process. 

A mathematical model alone is not sufficient. Model checking makes only sense if 
there are properties to be checked. As with modelling, there is often more to deriving 
verihcation requirements than simply translating given informal requirements. Some 
requirements might be implementation details, such as the target platform to be used. 
For other requirements it may be sufficient to simulate the model to check it. Only few 
requirements need to be verihed formally. 

Given the requirements and mathematical system model one can start building a 
verihcation model. Having both, properties and mathematical model, is important for 
determining what aspect of the system has to be included in the verihcation model. 

A necessary next step, to deal with complexity, might be to divide the verihcation 
problem into sub-problems. A well known approach is assume-guarantee reasoning that 
introduces a number of tractable verihcation problems, that, when verihed individually, 
imply the correctness of the requirement [10]. 

A last step in this verihcation process is to set up the model checking algorithm. 
This might include choosing a proper size for a hash table, dehning a proper exploration 
order, or, since many hybrid system tools rely on numerical routines, to choose suitable 
numeric tolerances. 

To verify the ETC requirements we will take the following steps: 

1 . Discover the mathematical model 

2. Obtain the formal requirements 

3. Build a verihcation model 

4. Set up the verihcation problem 

5. Set up the model checking algorithm 




Hybrid System Verification Is Not a Sinecure 265 



These steps will be part of any verification. In practice there is often not clear distinction 
between the separate steps, and steps 3 to 5 may be iterated a few times. 

The next section describes the hybrid systems model checker CheckMate. The re- 
maining sections will then discuss each of the above steps in more detail, and use the 
ETC case study for illustration. 

3 A Brief Introduction to CheckMate 

CheckMate is a model-checker for polyhedral invariant hybrid automata (PIHA) [3], a 
slightly restricted class of hybrid automata. As hybrid automata, PIHAs have a finite 
number of control locations. In each location a set of differential equations governs 
the continuous evolution of the continuous state variables. Locations are switched as 
soon as switching conditions become true. These switching conditions are defined as 
conjunctions of linear inequalities. Transitions can also reset the continuous state vector 
by applying an affine mapping . 

The model-checking algorithm of Checkmate partitions the state space, and over- 
approximates the transition relation using flowpipe approximations. CheckMate then 
model checks the obtained abstraction against an ACTL specification. ACTL is a subset 
of CTL (computation tree logic) that states universal properties, that is, properties that 
are true for all trajectories of the system [4]. 

A flowpipe is the set of all states that are reachable from a given initial set by 
continuous evolution. A flowpipe can be viewed as a bundle of trajectories. Checkmate 
uses polyhedra to over- approximate flowpipes. This has the advantage that intersections 
of approximations with switching conditions and invariants, yield again polyhedra. The 
basic steps of the model checker are manipulations of polyhedra, computing flowpipe 
approximations, and model checking the resulting finite-state abstraction. 

For a differential equation x = f{x), with x G K", let (p(xo,t) be the solution 
at time t with initial point xg- Given an initial set AT (0) C K", we define a flowpipe 
segment from G to t 2 as the set {x|3a;o G X(0),t G [^ 1 ,^ 2 ]- x = (p(xo,t)}. The 
over-approximation of this segment is computed as follows (illustrated in Figure 1): 

1. For the vertices a;oi of X(0) compute G) and f 2 )- Check- 
Mate uses numerical integration to compute these points. 

2. Compute a polyhedron that encloses these points. CheckMate computes either con- 
vex hulls or oriented hyper- rectangles [21], depending on an option set by user. Later 
in this paper we discuss implications of this choice. This polyhedron is an initial 
guess, and does not need to include the complete flowpipe segment. 

3. Determine the linear inequalities CiX < di, with Ci G and di G M, that define 

the initial polyhedron. 

4. Solve for each face of the polyhedron the optimization problem 

di = max Ciip{xo,t) (1) 

a:o€X(0) 
te[ti ,*2] 

The conjunction of the inequalities CiX < di then defines an over-approximation of 
the flowpipe segment, i.e. of all points that are reachable from AT(0) within interval 
[ti,t 2 ] time. 
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Fig. 1. Steps in the flowpipe approximation are (1) simulating the vertices, (2) enclosing the 
simulation points in a polyhedron, (3) determining the normals and (4) increasing the size of the 
polyhedron, until it contains the complete segment. 



Extending the flowpipe approximation to PIHAs with parametric differential equa- 
tions X = f{x,p), where p is an unspecified constant parameter, is straightforward. We 
assume that p is an element of a bounded polyhedron P in K™. In the first step, we 
simulate all vertices of A(0) for all vertices of P. In the next two steps, we compute 
the enclosing polyhedron of the simulation points, as before. The last step includes the 
parameter in the optimization problem (1): 

di = max Cipixn.t) 

»06X(0) 

p€P 



This defines a polyhedron that includes all states that are reachable from A(0) with 
parameter values in P and within interval [fi, f 2 ] time. Note that while the parameter 
is assumed constant during continuous evolution, it may change non-deterministically 
when the analysis evaluates a discrete transition. This, however, includes also the case 
that the parameter remains constant, and the analysis is therefore conservative. 



4 Discovering the Mathematical Model 

The first step towards verification is to get an understanding of the system behavior. 
The essential components of the system, the control structure, and the physical laws that 
govern the behavior must be identified. Information from an informal description may 
be supplemented by a simulation model. 

The mathematical model captures different system characteristics and should reflect 
the following aspects: 
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Fig. 2. The angle of the throttle plate determines the air flow, and indirectly the engine speed. 



- Physical and mechanical laws, or chemical, biological processes. These are typically 
described as differential equations or inequalities, partial differential equations, and 
algebraic constraints. 

- Switching conditions. Even if the state of a component evolves continuously, it may 
switch autonomously, for example in an elastic collision. 

- Time scale. For example determined by the poles of a linear time-invariant system, 
or by the sampling rate of a sensor. 

- Switching logic. May be modelled as state chart or as finite state machine. Control 
logic might also be given as a program, e.g. as relay lader logic or sequential function 
chart for PLCs 

- Control laws. Often defined by continuous-time feedback controllers or discrete 
time difference equations; in other applications completely encoded in the switching 
logic. 

- Communication among components. Can be synchronous or asynchronous, with 
shared events or variables, using message buffers, channels, broadcasting, interrupts, 
or a combination of those. 

If we are given a formal model, rather than an informal description, then the se- 
mantics define the mathematical model. In this case this model may be suitable directly 
for verification. The ETC system, however, was presented as MatLab/Simulink model, 
accompanied by an informal description [7,6]. We use information from simulation and 
informal description to develop the mathematical model manually. 

The ETC system replaces the mechanical link between pedal and throttle plate. Figure 
2 depicts the throttle plate as part of the powertrain. The throttle plate angle determines 
the airflow to the combustion chamber, and controls thus (along with the amount of 
injected fuel) the engine torque. The task of the ETC is to control the throttle angle, 
based on current control mode and human input. 

The ETC system comprises a pulse-width modulation (PWM) driver, an actuator (a 
DC motor), the mechanical system (the throttle and spring), sensors and a controller (Fig. 
3). The plant dynamics, i.e. the DC motor and the throttle behavior, are modelled as non- 
linear dynamic systems in Simulink. The PWM driver, the switching logic in the ETC 
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Fig. 3. An automotive throttle control system. 



controller, and the scheduler are modelled as Stateflow diagrams. Control laws, such 
as a sliding mode controller, are modelled as difference equation. Connection between 
blocks are continuous-time real-valued signals. Switching in the simulation model occurs 
either by triggered transitions, or by discontinuous Simulink blocks, such as the sign- 
function block. The remainder of this section describes the mechanical system and the 
ETC controller in more detail. 



The Mechanical System. The behavior of the throttle plate is governed by spring dynam- 
ics, Coulomb friction, viscous friction (airflow) and input torque. A changing current in 
the mechanical system induces an electro-magnetic force (back EMF) that opposes the 
change. The back EMF is a feedback to the actuator. The mechanical system is a second- 
order nonlinear continuous time-invariant system; the Coulomb friction accounts for the 
non-linearity. 



The ETC controller. The ETC controller has several levels of hierarchy. The top level is 
a Stateflow diagram, with four normal modes, two failure modes and a startup mode. The 
human control mode uses a sliding-mode controller. All other modes are merely place- 
holders for undefined implementation details. The controller delivers in those modes just 
a constant and meaningless output. 

The ETC controller uses a fifth-order filter (with poles -80, -80, -90, -90, -100) to 
smooth the input from the human driver (the sensor output). The performance of this 
filter determines in part whether the controller meets its performance requirements. 

Sliding-mode controllers are commonly used in control applications, since they are 
very adaptable and versatile [22]. Sliding-mode controllers are designed as follows: 
First, define a surface in the state space, and show that states on the surface behave as 
desired. Next, design for each side of the sliding surface a control law that drives the 
system to the sliding surface, as illustrated in Figure 4 (a). As soon as the system hits 
the sliding surface, close enough to an equilibrium point, it stays on the surface and 
converges to the equilibrium point. 

The model of the controller contains, in addition to the Stateflow model, the sliding- 
mode controller, the place holders for the other control modes, the fifth-order filter, 
blocks that model sampling of input and output, fault detection, delays, a scheduler, and 
finally signals that interconnect all components. 
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Fig. 4. Illustration of the concept of sliding-mode control. 



5 Obtaining the Formal Requirements 

When handed an informal description of the system requirements, only some will be 
suitable for verification. We can distinguish three types of requirements. 

- Implementation requirements. They impose certain implementation details that can 
be checked statically. They vary from requirements on floating point precision, 
platform, controller, programming language, clock-speed, scheduling policy or input 
range of sensors. There is no need to examine the dynamic system behavior to check 
these requirements. 

- Requirements on representative behaviors. Those requirements define an acceptance 
criterion for a single execution of the system. Satisfaction of the requirement can 
be established by a single run of the simulation model; we will refer to them as 
simulation requirements. These requirements often serve as testing scenarios. 

- Requirements on classes of behaviors. These requirements define a possibly infinite 
set of acceptable behaviors, of possibly infinite length. A typical example would 
be a liveness property such as "Each request is always eventually granted", which 
is defined for runs of infinite length. We refer to these requirement as verification 
requirements, since they require a formal proof. 

The informal description of the ETC lists seven requirements. They include imple- 
mentation requirements as well as simulation and verification requirements. We will 
focus on a few of these requirements for illustration. 

The requirement that the nominal battery voltage should be 12 VA is a typical im- 
plementation requirement. There is no need to use simulation or formal verification, and 
correctness can be proven by inspection of corresponding parameter. 

The rise-time requirement for the ETC is a requirement on representative behaviors. 
The rise time is defined as “the time required for the throttle plate angle response to a 
step change in pedal position to rise from 10% of the steady-state value to 90% of the 
steady-state value”. The informal description continues, “The rise time for step changes 
from closed to fully open is 100ms (...)”. The requirements thus put bounds on these 
times, given a particular change in the input signal. Whether this requirement holds can 
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Fig. 5. The throttle angle (solid) in response to an input (dashed) that changes between a wide 
open throttle, and a fully closed throttle. 



be answered by a single simulation with the test input. The simulation test shows that 
the rise time requirement is indeed satisfied. 

Another requirement was informally expressed as: " [The] throttle plate shall never hit 
the stops" [7, p. 10]. This requirement has to hold, no matter the input or operation mode 
(failure modes are excluded). This requirement is a candidate for formal verification. 
However, running just a few simulations shows that it is possible to reach the upper 
bound with a positive velocity, as can be seen in Figure 5, at approximately time 0.2 
seconds. Formal verification is therefore not necessary. The counterexample proves that 
the requirement is not satisfied. 

A verification requirement for the ETC is: The system will, after a step input, always 
eventually enter a certain neighborhood of the steady-state, and remain there forever. 
This property has an unlimited time horizon, unlike simulation requirements. In addition 
we assume that spring constant and spring equilibrium may deviate from their nominal 
values by up to 20%. Rather than defining infinitely many behaviors for a single system, 
this requirement defines a single behavior of infinite length for a infinite class of systems. 
Simulations in contrast require the parameters to be known exactly. In the verification 
model, these parameters are only known within bounds, and the verification covers all 
parameters values within these bounds. 

This verification requirement was not part of the informal description for the ETC. 
We introduced it, since all requirements in the informal description could be checked 
easily by inspection or simulation. As a side note, in the cases were counterexamples 
were found, there was a subjective acceptance criterion, that considered these violations 
as not significant. The requirements are not just true or false, but certain violations are 
acceptable within a certain, albeit subjective range. 



6 Obtaining the Verification Model 

A limiting factor in hybrid system verification is the number of continuous variables and 
the number of control locations [13]. Verifying a model with a fifth-order system just to 
filter the input is already challenging. The ETC system has in addition a few character- 
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istics that make verification of the full model impossible. As in other applications, we 
do not verify the implementation model but a scaled-down version [19]. 

A well known technique for scaling down hybrid system models is abstraction [10]. 
An abstraction preserves the essential behavior of the original system. It is guaranteed 
that if a safety property holds for the abstraction, it also holds for the original system. But 
techniques from system and control theory, such as order-reduction and linearization, 
can also be useful to obtain proper approximations of the original system. 
Considerations when building a verification model are the following. 

- First, the verification model must be in the class of systems that the model checker 
can handle. Or vice versa, one has to choose a tool that can handle the class of 
systems. 

- The kind of dynamic behavior. Models with non-linear dynamics are harder to 
analyze than models with linear dynamics, which are harder to analyze than multi- 
rate problems. 

- The number of continuous variables. Even if the problem has simple dynamics, 
additional continuous variables add complexity. 

- The switching behavior of the system. Even systems with few locations can have un- 
desirable switching behavior. A particular example is Zeno-behavior, i.e. a behavior 
that exhibits an infinite number of discrete transitions in finite time. 

We illustrate the process of obtaining a CheckMate model for the ETC case study. 
The starting point is the OEP model. The OEP model serves two purposes: It is used 
for simulation studies, and it is used as a blueprint for implementation. There is limited 
incentive to be concerned about complexity, since it is used for simulation rather than 
verification. As blueprint for implementation, the OEP model contains details such as 
what task has to run on what platform and under which scheduling policy. On the other 
hand, when implementation details are unknown it contains empty subsystems that serve 
as placeholders for future implementation. 

PIHAs are continuous-time models* and can include non-linear dynamics. However, 
some non-linearities can cause numerical problems, and a linear abstraction might be 
easier to analyze. The number of variables is a concern, too; models with more than 5 
continuous variables are typically hard to analyze. Verification for more than 10 contin- 
uous variables is in most cases impossible. Furthermore, CheckMate assumes no two 
transitions can happen in zero time, which in particular excludes certain Zeno-behavior. 

Obtaining a Continuous-Time Model. Since CheckMate models are continuous-time, 
the discrete-time components of the OEP model have to be replaced by appropriate 
continuous-time components. Discrete-time components in the OEP model are PWM 
driver, sensors and ETC controller. 

The Simulink model of the ETC controller has only one mode with meaningful 
dynamics, the human control mode. We omit in the verification model all other modes, 
and can omit also the control logic. Filter and sliding-mode controller were designed 
as continuous-time models, but then discretized to become part of the ETC controller. 
Hence, we replace them by their continuous-time equivalent. The sampling times of 

* There are extensions of CheckMate that allow discrete time and sample-data models [20,15]. 
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Fig. 6. The left figure illustrates chattering at the sliding surface. The right figure illustrates the 
effect of a boundary layer. 



PWM and sensors are a few orders smaller than the time scale of interest (100ms) and 
can replace them by gains. 

Resolving Zenoness. CheckMate assumes, as most hybrid model checkers do, that all 
behaviors are non-Zeno. The sliding-mode controller, in contrast, intentionally drives 
the system to a surface where infinite, and even uncountable switching occurs (in the 
continuous-time realization). If we use a fixed step integration routine the solution will 
start chattering, which may lead to unreliable results. If we use a variable-step integration 
routine the procedure tends to get stuck on the sliding surface. 

To resolve this problem, we define a boundary layer (or e-neighborhood) around the 
sliding surface. We apply the sliding-mode controller outside of this e-neighborhood, 
and replace it inside by a control law that is continuous and drives the system to the 
surface. The controller is thus equivalent to the original controller outside the boundary 
layer, and there is a steep but continuous transition from one sliding mode to the other. 
On the sliding surface the control law is equal to the so-called equivalent controller. 
Figure 6 (b) depicts the basic idea of a boundary layer. The boundary layer leads to a 
numerically well-conditioned, non-Zeno, and close approximation of the ideal sliding- 
mode behavior. Boundary layers are a very common approach used in physical systems 
to mitigate the physical stress by chattering that can lead to mechanical damage. A more 
thorough discussion of reachability analysis for sliding-mode controllers can be found 
in [14]. 

Modelling Non-Linearities. The mechanical system describes the dynamics of the throt- 
tle plate. This second-order system is nonlinear due to coulomb friction. We have to 
decide whether to include this non-linearity and non-linearities caused by saturation 
(actuator) and sliding-mode control as different modes, or as non-linearities. One ex- 
treme choice would be to model the ETC as non-linear hybrid system with a single mode. 
The other extreme choice would be to model it with linear dynamics, which results in 
1 8 modes for the ETC problem. 

If we put the complete behavior in a single non-linear differential equation, the 
flowpipe-approximation gets worse and computationally more expensive when the vector 
field changes abruptly, e.g. when the system changes the sliding modes. 
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If we model the system with many modes but with linear dynamics, it will result 
in a lot of switching. Each time the analysis algorithm encounters switching between 
modes it uses over-approximations of previous steps, and over- approximation errors may 
proliferate. Many modes lead thus to an increased over-approximation error. Based on 
above consideration we have chosen to model the Coulomb friction and saturation as non- 
linearities, to reduce the number of modes, and to model the sliding-mode controller as 
different modes, to avoid over- approximation errors due to sudden changes of the vector 
field. This decision was made after running a number of experiments with different 
models. 

Reducing the Order. The ETC uses a fifth-order filter to smooth the input from the 
human driver. This means that the filter alone has more than twice as many state variables 
as the rest of the system. Since verification of hybrid systems becomes more difficult 
with each additional dimension, we reduce the order of the filter. We obtain a reduced- 
order filter using the model-reduction capabilities of MATLAB’s system identification 
toolbox. The combined dynamics of plant and reduced filter result in a fourth-order 
system with nonlinear dynamics. Eor a more thorough discussion of order-reduction for 
hybrid system verification see [8]. 



7 Setting Up the Problem 

This section addresses the problem the requirement cannot be verified directly, due to 
the size of the verification problem. A common approach is to decompose system and 
property into smaller problems [12,18]. We illustrate this idea, by decomposing the 
liveness property for the ETC into a series of properties, which can all be verified with 
CheckMate. 

The property that we verify is the following. Given that the system is in steady-state 
with throttle angle a = 0, assume a step change in the desired angle to 89.8 degrees 
(which is the maximal input; the input has a safety margin of 0.2 degrees) at time 0. 
Verify that the angle will always eventually reach a 2% neighborhood of the desired 
angle, and remain there forever. We furthermore assume that the spring constant and 
spring equilibrium may deviate from their nominal values by 20%. This means that they 
may take any value in this range. 

We define a cascade of subproblems to show that the system behaves as desired. Eor 
each of the stages we use a variant of the basic CheckMate model. Eor each stage we 
define a different initial and target set. The target set of one stage is then the initial set 
of the next stage. 

Transient phase. The first stage of the cascade deals with the transient phase when the 
throttle angle changes quickly in response to the step input. We show that all trajectories 
that start from the initial set - in this case the origin - hit the first target set, the so 
called outer box. Eigure 7 (i) and (ii) depict projections of the flowpipe approximations 
that show that all trajectories do indeed reach the outer box. The model checker verifies 
furthermore that the system will always reach this set eventually. 
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Fig. 7. Figure (i) and (ii) depict the flowpipe approximation for the transient phase. Figure (ii) 
is a close up of Figure (i). Figure (ii) depicts the flowpipe approximation in the neighborhood 
of the steady-state value. Finally, Figure (iv) shows that the states that start in the inner box will 
eventually return to this set. Note that these figures show only projections of the flowpipe segments 
onto throttle angle a and angular velocity ui. 



Regulation phase. The next stage is to show that all trajectories that start in the outer box 
will eventually reach the inner box. We use the same model as for the transient phase, 
but of course with the outer box of the transient phase as initial set, and the inner box as 
target set. Figure 7(iii) show that the system starts in the outer box and all trajectories 
converge quickly to a neighborhood of the steady-state. No segment of the flowpipe 
approximation violates the 2% bound. This guarantees that once the system enters the 
outer box, the inner box will be reached without violating the 2% bound. 

Asymptotic behavior. As the last step we show that the inner box, a neighborhood of 
the steady-state value, maps onto itself in a finite number of steps. CheckMate finds a 
flowpipe segment that is completely contained in the inner box, i.e. the initial set of this 
stage. This means all trajectories that start in the inner box, return to this set. None of 
the computed flowpipe segments of the over-approximation violates the 2% threshold. 

Figures 7(iv) depicts the result. Note that it is not sufficient to show that some flow 
pipe segment is contained in another, since they are over-approximations. We cannot 
assume that all states in a segment are actually reachable. But if some segment is inside 
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the initial set we know that this set is recurrent. All states that can be reached in a certain 
time interval from the initial set will be contained in this segment, and thus also in the 
initial set. This completes the verification. 



8 Setting Up the Verification Algorithm 

The previous section presented the verification results. Getting the verification results is 
not only a matter of defining subproblems and corresponding models - which is some 
work by itself - getting the verification to run requires also a fair amount of tweaking 
the model checker. For finite-state model checkers this might entail choosing a proper 
order of the variables, for other model checkers it might entail to find a proper size 
for the hash table. To give an impression of the kind of choices that have to be made 
for CheckMate, we elaborate on the choice when to use convex hulls and when to use 
oriented rectangular hulls in the approximation. 

This choice makes a difference in two different steps of the algorithm. As mentioned 
before, CheckMate obtains an initial approximation of the flowpipe segment by com- 
puting a polyhedron that encloses the simulation points (step 2, page 265). The convex 
hull of these points is by definition the smallest polyhedron that contains all points. 
Using the convex hull in this step has the advantage that the over- approximation error is 
small. A drawback is that the convex hull will also yield polyhedra with a lot of faces. 
Each additional face leads to one additional optimization problem in the last step of the 
flowpipe approximation procedure. 

CheckMate offers as an alternative to use the so called oriented rectangular hall 
(ORH) routine [21]. The ORH routine chooses an oriented hyper-rectangle that keeps 
the over- approximation error small and limits at the same time the number of faces. 
For the ETC model with four state variables, the ORH will result in a polyhedron 
with exactly eight faces. If we use the convex hull approximation instead, CheckMate 
computes polyhedra with up to 1 19 faces before it gets stuck. Using the ORH solves this 
problem, and all segments of the approximation can be computed. 

Another point where the choice between the convex hull and the ORH routine mat- 
ters, is when CheckMate computes states reached by a discrete transition. To do so, 
CheckMate intersects each segment of the flowpipe approximation with the switching 
condition. If more than one segment intersects with a switching condition, the verification 
algorithm proceeds with an over- approximation of the union of these intersections. This 
over-approximation can either be the convex hull or the ORH of those sets. For this case 
study we found that the results of the ORH are too conservative. The over-approximation 
error soon becomes too large. 

To summarize. To get the verification to run requires a proper setup of the verification 
algorithm. We use, for example, the ORH routine to compute the polyhedra of the 
flowpipe approximation, and the less conservative convex hull routine to compute the 
over-approximation of the intersections with switching conditions. Similar choices had 
to be made to find the proper integration routine, and to chose parameters for numerical 
integration and optimization routines. 
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9 Discussion 

The starting point for the work presented in this paper was the OEP model and an informal 
description of the ETC system. The first step towards verification was to discover the 
underlying mathematical model. The OEP model was useful since it already provided 
information about the main components. When a such a model is not present in the 
beginning, building a simulation model can help significantly to understand the problem. 

The second step towards the verification was to formulate the requirements of the 
system. In our case we had an informal description to start from. We found that none 
of the given requirements was suitable for verification. Most of the requirements were 
simulation requirements that could be checked by running a simulation, or implemen- 
tation requirements that could be checked by inspection. For others we easily found 
counterexamples, and verification was not necessary either. 

There is no need to use verification to check simulation and implementation require- 
ments. Verification methods should not be used for the sake of verification. Verification 
should be used if the requirements are formulated for parametric models, uncertain initial 
conditions, or non-deterministic models. In that case formal verification can be valuable 
and complementary to simulation-based methods. We defined a liveness property for 
fhe system fhat capfures fhe sprit of the simulation scenarios, but that also illustrates the 
added value of verification. 

The mathematical model and the liveness requirement were the basis of the verifi- 
cation model. Building the verification model involved simulation of various models. 
We took into account that we needed a continuous-time model, with a small number 
of continuous variables, and dynamics that are numerically well-conditioned. The final 
resulf was a fourth-order hybrid system wifh non-linear dynamics. 

Given fhe verification model we could not verify the requirement directly, but de- 
composed the problem into smaller problems. For the ETC case study it was sufficient 
to define a series of three problems that were solved by CheckMate. Finally, we had to 
find suitable verification parameters, which required several experiments with different 
settings. 

Given our experience from the ETC case study, future research should focus on 
supporting the process described in this paper. Hybrid systems verification will in the 
foreseeable future not become a completely automated process. There is a lot of work 
currently focussing on automating and supporting particular steps, but little that aims to 
support the complete process. Tool support can be useful to guide and assist the designer 
throughout the process that leads from informal description to verification result. At the 
same time it can help to make this process transparent, such that the steps and choices 
can be re-evaluated at a later stage. 
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Abstract. While model checking suffers from the state space explosion 
problem, theorem proving is quite tedious and impractical for verifying 
complex designs. In this work, we present a verification framework in 
which we attempt to strike the balance between the expressiveness of 
theorem proving and the efficiency and automation of state exploration 
techniques. To this end, we propose to integrate a layer of checking 
algorithms based on Multiway Decision Graphs (MDG) in the HOL 
theorem prover. We deeply embedded the MDG underlying logic in HOL 
and implemented a platform that provides a set of algorithms allowing 
the user to develop his/her own state-exploration based application 
inside HOL. While the verification problem is specihed in HOL, the 
proof is derived by tightly combining the MDG based computations 
and the theorem prover facilities. We have been able to implement and 
experiment with different state exploration techniques within HOL such 
as MDG reachability analysis, equivalence and model checking. 



1 Introduction 

Whenever an error creeps into a design, time and money must be spent to lo- 
cate the problem and correct it, and the longer a bug evades a detection, the 
harder and more expensive it is to fix. As design complexity increases, simulation 
times become prohibitive and coverage becomes poor, allowing numerous bugs 
to slip through to later stages of the design cycle. What is needed, therefore, 
is a complement to simulation for determining the correctness of a design. For 
this reason, there has been a surge of research interest in formal verification 
techniques. In general, a formal verification problem consists of mathematically 
establishing that an implementation satisfies a specification. The implementa- 
tion refers to the system design that is to be verified and the specification refers 
to the property with respect to which the correctness is to be determined. 

Formal verification methods fall into two categories: proof-based methods, 
mainly theorem proving and state- exploration methods, mainly model checking 
and equivalence checking. While theorem proving is a scalable technique that can 
handle large designs, model checking suffers from the so-called state-explosion 
problem which prevents its application to industrial systems. On the other hand, 
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while model checking is fully automatic, deriving proofs is a user guided tech- 
nique that requires a lot of expertise and hence can be tedious and difficult. Both 
techniques do not allow the automatic verification of large systems. So, various 
compromises are being explored to combine the strengths of both. They can 
be summarized as : (i) tools integration, (ii) adding deduction rules to a state- 
of-the-art checking tool, or (iii) deeply embedding checking algorithms inside a 
theorem prover. For the first approach, we start with two stand-alone tools, a 
theorem prover and a checking tool, where we link the latter to the theorem 
prover using scripting languages to be able to automatically verify small sub- 
goals generated by the theorem prover from a large system. The starting point 
of the second approach is an automatic (model) checker to which we add proving 
rules to hopefully extend the verification to complete systems. Finally, the third 
approach, which is the one we adopt in our work, consists of embedding algorith- 
mic infrastructures inside a theorem prover resulting in a hybrid system tightly 
combining checking algorithms and proving facilities. This approach differs from 
the first one in the way the verification is performed. In fact, we do not use an 
external checking tool, instead we develop state-exploration algorithms inside 
the theorem prover. 

In this work, we developed a platform of state-exploration algorithms inside 
the HOL proof system [9]. Our decision diagram data structure is the Multiway 
Decision Graphs (MDGs) [4], which we integrate in HOL as a built-in datatype. 
The logic underlying MDGs is embedded as a theory that provides the tools 
to specify the verification problem in the logic supported by the MDGs. The 
specification consists of a set of HOL formulae that can be represented by their 
correspondent MDGs. Operations over these formulae are viewed as MDG op- 
erations over their respective graphs. An MDG package is then used to build 
the graph representation of HOL formulae allowing the manipulation of graphs 
rather than HOL terms. Once available inside the theorem prover, the MDG 
data structure and operators can be used to automate parts of the verification 
problem or even to write state enumeration algorithms like reachability analysis 
or model checking. 

The organization of this paper is as follows: Section 2 reviews some related 
work. Section 3 describes the embedding of the logic underlying the MDGs in 
HOL. Section 4 shows how HOL is linked to the MDG package. In Section 5, we 
describe the embedding of the reachability analysis procedure. Sections 6 and 7 
illustrate the use of the embedding in the implementation of state-exploration 
algorithms and decision procedures, respectively. Section 8, finally, concludes the 
paper and gives some future research directions. 



2 Related Work 

The quest for an efficient combination of theorem proving and model checking 
has long been one of the major challenges in the field of formal verification. The 
work described here has been strongly influenced by the HolBdd [7,8] system 
developed by Gordon. HolBdd consists of a platform allowing the programming 
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of Binary Decision Diagrams (BDD) based symbolic algorithms in the Hol98 
proof assistant. It provides intimate combinations of deduction and algorithmic 
verification. They use a small kernel of ML functions to convert between BDDs, 
terms and theorems. Their work was applied to perform reachability program- 
ming in Hol98. 

A pioneering work in the area of linking theorem proving with automated 
verification tools is the one of Joyce and Seger [11] combining HOL and the sym- 
bolic trajectory evaluation (STE) tool VOSS. A HOL tactic, VOSS-TAC, calls 
the VOSS system as a child process of the HOL system to check whether an 
assertion, expressed as a term of higher-order logic, is true. Early experiments 
with HOL-VOSS suggested that a lighter theorem prover component was suffi- 
cient, since all that was needed was a way of combining results obtained from 
STE. A system based on this idea, called Voss-ThmTac, was later developed by 
Aagaard et al. [1], which combines the ThmTac theorem prover with the VOSS 
system. Its power comes from the very tight integration of the two provers, using 
a single language, EL, as both the theorem prover’s meta-language and its object 
language. 

Rajan et al. [18] described an approach where a BDD based model checker 
for the propositional y^-calculus has been used as a decision procedure within the 
framework of the PVS proof checker [16]. They used /i-calculus as a medium for 
communicating between PVS and the model checker. Temporal operators are 
given the customary fixpoint definitions using the ^-calculus. These expressions 
were translated to the form required by the model checker. 

Hurd [10] used PROSPER [5] to combine the Gandalf first-order theorem 
prover with HOL. A HOL tactic, GANDALF_TAC, is used to enable first-order 
HOL goals to be proven by Gandalf and mirror the resulting proofs in HOL. It 
takes the original goal, converts it to the appropriate format, and sends it to 
Gandalf. Gandalf then parses the proof, translates it to a HOL proof and proves 
the original goal in HOL. 

Schneider and Hoffmann [19] linked the SMV model checker [13] to HOL 
using PROSPER. They embedded the linear time temporal logic (LTL) in HOL 
and translated LTL formulae into equivalent w- Automata, a form that can be rea- 
soned about within SMV. The translation is completely implemented by means 
of HOL rules. The deep embedding of the SMV specification language in HOL 
allows LTL specifications to be manipulated in HOL. 

In [12], [17] and later [15] a hybrid tool and a methodology tailored to perform 
hierarchical hardware verification have been developed by the Hardware Verifi- 
cation Group of Concordia University. The hybrid tool, called HOL-MDG, inte- 
grates the HOL theorem prover with the MDG tool by performing equivalence 
and model checking using two HOL tactics, MDG_EQ_TAG and MDGJV1G_TAG, 
respectively. In case the design is large enough to cause state explosion, and as- 
suming a hierarchical model, a tactic HIER_VERIF_TAG is called to break the 
design into sub-blocks. The same procedure is recursively applied if necessary. 
At any point, the goal proof can be done in HOL. 
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While [12,15,17] describe systems integrating two stand-alone tools, namely, 
HOL and an external MDG tool, the work described in this paper is not intended 
to use an external tool to verify subgoals. Instead, MDGs are defined as a built- 
in datatype of HOL and operators over MDGs are available in the proof system, 
which allows us to tightly combine HOL deduction and MDG computations. 
Besides, state-exploration algorithms will be written inside HOL. Thereafter, 
the main difference between our approach and the HOL-MDG tool is that our 
embedding provides a secure and general programming infrastructure to allow 
the users to implement their own MDG-based verification algorithms inside the 
HOL system. 

The work in [1,10,11,19] use the same approach as the HOL-MDG hybrid tool 
in the way they integrate the model checker to the theorem prover. The work 
in [18] uses the y^-calculus as a medium for communicating between the theorem 
prover and the model checker. It is a shallow embedding of a stand-alone tool’s 
language while ours is a deep embedding of the decision diagram data structure 
and its operators are embedded inside the theorem prover. 

Obviously, the most related work to ours is that of Gordon [7,8]. Our work, 
however, deals with embedding MDGs rather than BDDs. In fact, while BDDs 
are widely used in state-exploration methods, they can only represent Boolean 
formulae. MDGs, however, represent a subset of first-order terms allowing the 
abstract representation of data and hence raising the level of abstraction. 

3 Embedding the MDG Logic in HOL 

3.1 Multiway Decision Graphs 

A Multiway Decision Graph (MDG) is a finite directed acyclic graph G where 
the leaf nodes are labeled by formulae, the internal nodes are labeled by terms, 
and the edges issuing from an internal node N are labeled by terms of the same 
sort as the label of N. Such a graph represents a formula defined inductively 
as follows: (i) if G consists of a single node labeled by a formula P, then G 
represents P; (ii) if G has a root node labeled A with edges labeled Pi, ...,Bn 
leading to subgraphs G'l, ..., G), and if each G' represents a formula Pi then G 
represents the formula Vi<i<„ {{A = B,)AP,). 

The above is of course too general, a set of well-formedness conditions [4] 
turns MDGs into canonical representations that can be manipulated by efficient 
algorithms. Multiway Decision Graphs are intended to represent Abstract State 
Machines (ASM) [4] , an abstract description of state machines based on a many- 
sorted first order logic with a distinction between abstract and concrete sorts. 
It is then possible to let nodes range over abstract sorts for which there is no 
enumerable set of edges, and to use non-mutually-exclusive first-order terms as 
edge labels. More details on MDG are described in the sections to follow. 

3.2 MDG Sorts 

An enumeration is a finite set of constants. While concrete sorts have enumer- 
ations, abstract sorts do not. This is embedded in HOL using two constructors 
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called Concrete-Sort and Abstract-Sort. The former takes as arguments a sort 
name and its enumeration to define a concrete sort. For example, if state is a 
concrete sort with [ stop; run ] as enumeration, then this is declared in HOL by: 

\~def state = Concrete_Sort "state" [stop;run] 

To define an abstract sort of type alpha (which means that the sort is actu- 
ally abstract and hence can represent any HOL type) we use the Abstract-Sort 
constructor as follows: 

i~def alpha = Abstract _Sort "alpha" 

To determine whether a sort is concrete or abstract, we use predicates over the 
sorts constructors called IsConcreteSort and Is Abstracts ort. These predicates 
will be used for instance to determine the sort of a variable or a function symbol. 

The vocabulary of the MDG based logic consists of concrete and generic 
constants, variables and function symbols (also called operators). The distinction 
between abstract and concrete sorts leads to a distinction between three kinds 
of function symbols. Let / be a function symbol of type cxi x ... x a„ — >■ a„+i. If 
Un+i is an abstract sort then / is an abstract function symbol. Abstract function 
symbols are used to denote data operations and are uninterpreted. If all ai...o;„+i 
are concrete, / is a concrete function symbol. Concrete function symbols, and 
concrete constants as a special case, can always be entirely interpreted and thus 
be eliminated; for simplicity, we assume that they are not used. Finally, if On+i 
is concrete while at least one of ai...a„ is abstract, then we refer to / as a 
cross- operator. 

3.3 MDG Variables 

An abstract variable can be either primary or a secondary variable. A primary 
variable labels a node in the graph while a secondary variable is an abstract 
variable occurring in the argument list of a function symbol. It can also be 
an abstract variable labeling an edge in the graph. In our embedding, a pri- 
mary abstract variable is declared using the A&stracL Far constructor. The Sec- 
ondary-Var constructor is used to declare a secondary variable. 

A variable is identified by its name and sort. For example. If a; is a concrete 
variable of sort state, declared above, then this is written in HOL as follows: 

\~def X = Concrete_Var "x" state 

Similarly, we use some predicates to determine whether a variable is concrete, ab- 
stract or secondary. They are called, respectively, IsConcreteVar, IsAbstractVar 
and IsSecondaryVar. 

3.4 MDG Constants 

A constant can be either an individual (concrete) constant or an abstract generic 
constant. The latter is identified by its name and its abstract sort. The individual 
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constants can have multiple sorts depending on the enumeration of the sort in 
which they are. We use the IndividuaLConst and Generic-Const constructors 
to declare constants in HOL. For example, the enumeration of the concrete 
sort state is [stop ; run ]. stop and run are two individual constants that have 
state as their sort. They must be already defined in order to be able to declare 
the sort state. To check whether a constant is an individual constant or an 
abstract generic constant, we define two predicates, IsIndividualConstant and 
IsGenericConstant. 



3.5 MDG Functions 

MDG functions can be either concrete, abstract or cross-operators. As mentioned 
before, concrete functions are not used since they can be eliminated by case 
splitting. Cross- functions are those that have at least one abstract argument. 
But when we focus on terms that are concretely reduced, all the sub-terms of a 
compound term (abstract/cross function) have to be abstract. In addition they 
are secondary variables. 

In general, a function is identified by its name, the sorts of its arguments 
and its sort. In this case, we specify the variables rather than sorts because we 
focus on cross-terms or abstract terms instead of the correspondent symbols. If 
equal is a function that checks if two abstract variables are equal, then, equal is 
a cross-function. 

\~def bool = Concrete_Sort "bool" ["0";"!"] 

\~def yl = Secondary_Var "yl" alpha 
\~def y2 = Secondary_Var "y2" alpha 
\~def equal = Cross_Function "equal" [yl;y2] bool 

If max is a function that takes two abstract variables as arguments and returns 
the greater one, then max is an abstract function. 

\~def max = Abstract_Function "max" [yl;y2] alpha 

The predicates IsAbstractFunction and IsGrossFunction are used to determine 
the nature of a compound term. 



3.6 MDG Terms 

MDG terms are either individual constants, generic constants, concrete or ab- 
stract variables, cross-operators or abstract function symbols. We provide a con- 
structor called MDG-Term that is used every time a new term is declared. The 
single constructor is used so that terms will have the same type and hence can 
be used in equalities. In fact if x is declared using the Goncrete. Far constructor 
and stop using the IndividuaLGonst constructor, we will not be able to write 
an equation of the form x = stop due to type mismatching. However, such an 
equation is possible if both are declared using the same constructor. 
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3.7 Well-Formed MDG Terms 



For BDDs to be canonical, they have to be reduced and ordered. Similarly, 
MDGs require certain well-formedness conditions to canonically represent the 
MDG terms. Such terms are called Directed Formulae (DF). Given two disjoint 
sets of variables U and V, a DF of type U V is a formula in disjunctive normal 
form (DNF) such that 

1. Each disjunct is a conjunction of equations of the form: 

— A = a, where A is a cross-term of concrete sort a containing no vari- 
ables other than elements of U, and a is an individual constant in the 
enumeration of a, or 

— u = a, where u € U is a variable of concrete sort a and a is an individual 
constant in the enumeration of a, or 

— V = a, where v £V is a variable of concrete sort a and a is an individual 
constant in the enumeration of a, or 

— V = A, where v &V is a variable of abstract sort a and T is a term of 
type a containing no variables other than elements of U; 

2. In each disjunct, the left hand sides of the equations are pairwise distinct; 
and 

3. In each disjunct, every variable v G V should appear as the left hand side of 
an equation v = A. 



Gonditions 2 and 3 must be respected by the user when specifying the verifica- 
tion problem. Gondition 3 is less stringent than it seems. In practice, one can 
introduce an additional dependent variable u and add an equation v = u to a 
disjunct where an abstract v is missing. 

For example, we embedded condition 1 in HOL and check it using the func- 
tion WelLformedTerm that, recursively, calls WelLformedEQ to check the well- 
formedness of an equation. In the definition below, eq is an equation of the form 
Ihs = rhs. 



\~def Well_f ormedEQ eq = 
((isConcreteVar Ihs) 
((isCrossFunction Ihs) 
((IsAbstractVar Ihs ) 
((IsAbstractVar Ihs ) 
((IsAbstractVar Ihs ) 
(isBool Ihs) 



A (isConcreteConstant rhs)) V 
A (IsConcreteConstant rhs)) V 
A (isAbstractFunction rhs)) V 
A (IsAbstractVar rhs)) V 
A (IsGenericC rhs)) V 



4 Linking HOL to the MDG Package 

The MDG logic is embedded in HOL to make it possible to specify a verifica- 
tion problem in HOL in terms of formulae that can be represented by canonical 
MDGs. The next step would be to provide the necessary tools to build and ma- 
nipulate the graph representations of these formulae. This platform will consist 
of ML functions that call an MDG package as an external process. The package 
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is invoked using a script file, in which, the different manipulations to be done 
in MDG are specified. For example, to perform the conjunction of a list of well- 
formed terms, we use the ML function Conj. This function calls an intermediate 
function to write the script file corresponding to a conjunction, then calls the 
specific MDG functions to perform the operation and eventually return the result 
to HOL. The ML functions pass the script file to the MDG package using the 
system function. The latter computes the result (MDG graph) and then writes it 
in a file “mdghol.cK\ Using the function ReadMdg Output, the result is retrieved. 



4.1 Constructing MDGs in HOL 

To construct the graph representation of a HOL term, we use the function 
termToMdg. Well-formedness conditions are first checked using the predicate 
WelLformedTerm, which either raises an exception when this is not the case or 
begins gathering the information to call the package. 

The first step is to determine the sorts of all the sub-terms using the func- 
tion ToMdgSorts. If a sub-term is of concrete sort Sort, it is declared as con- 
cretesort(Sort,Enum) , where Enum is the enumeration of Sort. When an ab- 
stract sort, say alpha, is encountered, then it is declared by abssort(alpha). For 
example, if a term A includes a concrete variable of sort bool and an abstract 
variable of sort alpha, then ToMdgSorts returns the following list: 

[conc_sort (bool , [0,1]), abs_sort (alpha) ] . 

The second step is to declare all the variables, functions and generic constants 
used in the term. A variable is declared by signal(label,sort) . A generic constant 
is declared by geri-Const(label,sort). When a function is encountered, both the 
secondary variables and the function symbol must be declared. The function 
symbol is declared as function(f, [sorts] , sort) . sorts are the sorts of the secondary 
variables, arguments to the function symbol /. sort is its target sort. 

Thereafter, termToMdg writes the variables order list in the script file and 
then calls the function header responsible for retrieving the list of the LHSs 
and RHSs of the equations in the term which will be the parameters of the 
mdg function. The latter is then called and the result is retrieved using the 
readMDGOutput function. Instead of returning the whole graph structure, we 
return only its ID, which will be used to map the term to its MDG representation. 



4.2 Embedding MDG Basic Operators 

The MDG operators are embedded, as well, to allow the manipulation of graphs 
rather than terms. We show below the basic MDG operators. 

- Conj: performs the conjunction of a set of graphs; 

- Disj: performs the disjunction of a set of graphs; 

- Relp (Relational Product): used for image computation. It takes the con- 
junction of a collection of MDGs, having pairwise disjoint sets of abstract 
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primary variables, and, existentially quantifies with respect to a set of vari- 
ables, either abstract or concrete, that have primary occurrences in at least 
one of the graphs. In addition, it can rename some of the remaining primary 
variables according to a renaming substitution; 

- PbyS (Pruning By Subsumption): used to approximate the set difference 
operation. Informally, it removes all the paths of a graph P from another 
graph Q. 



5 Reachability Analysis in HOL 

The reachability analysis is embedded using the MDG operators interfaced to 
HOL. We show here the different steps to compute the set of the reachable states 
of an abstract state machine. 



5.1 Computing Next States 

Let I, B and R be, respectively, a set of inputs, a set of initial states of a machine 
and its transition relation. The set of next states reached from B with respect 
to the transition relation R is computed using the ML function ComputeNext. 
This is done by, first, computing the graphs of I, B and R. The RelP operator 
is then used after identifying the renaming substitution function and the set 
of inputs and state variables over which the MDG is quantified. The resulting 
graph represents the set of next states. 



5.2 Computing Outputs 

The set of outputs corresponding to a set of initial states and inputs, with respect 
to an output relation O is computed in the same way as the next states. But 
instead of using the transition relation R of the machine, the output relation O 
is used. For every state of the machine, and a set of data inputs, corresponds 
a set of output values. These will be used to check if an invariant holds in the 
current state. 



5.3 Computing Frontier Set 

The frontier set is the set of newly visited states. If V represents the set of states 
already visited, = ComputeNext{I V R) is the set of next states reached 
from V. In this case the frontier set is \ M which is represented by the ML 
function ComputeFrontier. The frontier set is used to check if all the states 
reachable by the machine are already reached. If this is the case (the frontier 
set is empty), then the reachability analysis terminates and the set of reachable 
states is returned. If the frontier set is not empty, then new states were visited 
during the last iteration. In this case, the analysis continues until reaching the 
fixpoint (set). 
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5.4 Computing Reachable States 

The set of reachable states is the set of all the states of a machine, starting 
from an initial state, for a certain set of inputs. For abstract state machines, the 
state space can be infinite. Hence, the set of reachable states may not exist^. We 
implemented in HOL the solutions proposed in [2] to compute the set of reachable 
states which we represented by the function, ComputeReachahle^ , defined in 
Figure 1. 



ComputeReachable Gj Gb Gb = 

K^O, S = Gb 

loop 

K = K + 1 

N = ComputeNext Gik Gb Gb 
if ComputeFrontier N S = F then return success 
Gb = ComputeFrontier N S 
S = Disj N S 

end loop 



Fig. 1. Reachability Analysis Algorithm 

ComputeReachable computes the set of reachable states S' of a state machine 
described by its transition relation, starting from an initial state and for a certain 
data input. S is initialized to B (the initial state), and the sets of next-states 
are computed until reaching a fixpoint characterized by an empty frontier set. 



6 Invariant and Model Checking in HOL 

6.1 Invariant Checking 

Invariant checking is a direct application of the reachability analysis algorithm. 
It consists of checking that a property or an invariant holds on the outputs 
of a state machine in every reachable state. First, the invariant is checked in 
the initial state. This is done by computing the outputs corresponding to that 
state and then using the MDG operators to check that these outputs satisfy the 
invariant. After that, next-states are computed and for every state reached, the 
invariant is checked on the outputs. In a given iteration, if the outputs of the 
machine satisfy the invariant, then the procedure continues for the next-state. 
If, on the other hand, the invariant does not hold, the analysis terminates and 
a failure is reported. A counterexample can be generated to trace the error. 

^ This is the well-known non-termination problem in MDG, which is discussed in [2] 
providing various heuristics to solve it. 

^ For the sake of clarity, this is just a simplified version of the algorithm 
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We implemented the invariant checking algorithm in HOL as an ML function 
Invariantchecking which takes as arguments: 

— Tr: the transition relation specified as a list of directed formulae; 

— Or\ the output relation specified by a directed formula; 

— In'- the initial state specified by a directed formula; 

— Inputs: the input variables list; 

— States: the state variables list; 

— NxStates: the next-state variables list corresponding to States. 

— Inv: the invariant to be checked, specified as a directed formula. 

We implemented in HOL the function Invariantchecking as defined in Figure 2. 
It first, builds the graphs of the transition relation, output relation, the initial 
state and the invariant using the function termToMdg. Then, generates the input 
graph. After that, the outputs are computed using NewOutputs and then the 
invariant is checked. If the invariant holds, the next-state variables are computed 
using ComputeNext. Checking the frontier set will cause the termination of the 
analysis or another iteration. 



Invariantchecking Tn On In Inputs States NxStates Inv = 

// builds the MDG representations 

— 0, S ~ Gin-, R ~ Gin 

loop 

K ^K+1 

/ / generates the input graph Gjk 

Os = ComputeOutputs Gqr R Gik 

if (PbyS Os Ginv) ^ F return faiinre 

N = CompnteNext Gik R Gtr 

if CompnteFrontier N S = F then return snccess 

R = ComputeProntier N S 

S = Disj N S 

end loop 

end Invariantchecking; 



Fig. 2. Invariant Checking Algorithm 



6.2 Model Checking 

MDG temporal operators can be implemented in HOL for model checking. In 
Figure 3 we present, for illustration purposes, how the operator AF on a first- 
order property formula P [20] is embedded. All the MDG model checking algo- 
rithms are embedded in HOL in a similar fashion. More details can be found 
in [14]. 
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Check_AF Tr 1 TTiputS StCitGS ^fxStCitGS P — 

// builds the MDG representations Gthi Gjn, Gp 
K — 0, V — U, C — Gii\[ 

// E contains sets of states not satisfying P 
loop 

= ComputeFrontier G Gp 

j j removes states satisfying P 

if Q = F then return success 

if ComputeFrontier E Q ^ S then return failure 
E = Disj E Q 
K^K + l 

G = ComputeNext G/jv Q Gpp 
end loop 
end Check_AF; 



Fig. 3. Model Checking Algorithm for AF 



6.3 Application 

We have experimented our embedding on some benchmark examples and case 
studies, including the Island Tunnel Controller (ITC), which was originally intro- 
duced by Fisler and Johnson [6]. The ITC controls vehicles traffic in a one-lane 
tunnel connecting the mainland to a small island. The ITC is specified using 
three communicating controllers and two abstract counters. We used the invari- 
ant checking procedure discussed above to verify a number of properties on the 
ITC. For each property, we derived those transition relations and variables in- 
volved in the property (specified manually) and let the property checking run 
automatically from within HOL. This reduces the verification problem and pro- 
motes hierarchical verification. In fact, every module of the design can be treated 
separately. Thus, enhancing a lot the performance of the verification task in 
terms of memory usage compared to verifying the whole system in MDG. It is 
needless to mention that the memory usage is one of the most challenging factors 
in formal verification as it is the cause of the state-space explosion problem. 

Table 1 summarizes the verification results of checking a set of properties 
on ITC using: (I) pure MDG, (2) the MDG-HOL shallow embedding approach 
[15], and (3) our MDG-HOL deep embedding. A beside a property means 
that this latter failed in the invariant checking, where a counterexample is gen- 
erated. Experiments are run on an Ultra2 Sun workstation with 296Mhz CPU 
and 768MB memory. The CPU times represent the system times to perform the 
reachability analysis. They also include the time to translate the HOL specifica- 
tion to MDG files in the case of our HOL-MDG deep embedding. The memory 
usage statistics represent the total memory used by the MDG tool to build the 
different graphs. As expected, when the verification can be handled using pure 
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Table 1. Performance comparison for the ITC benchmark 



Property 




MemoryMBHte 


MDG 


HOL-MDG 

(Shallow) 


HOL-MDG 

(Deep) 


MDG 


HOL-MDG 

(Shallow) 


HOL-MDG 

(Deep) 


Propertyl 


0.87 


1.15 


101.9 


0.66 


0.47 


0.220 


*Property2 


0.55 


0.87 


52.8 


0.27 


0.23 


0.013 


PropertyS 


0.57 


0.91 


54.9 


0.31 


0.26 


0.077 


Property! 


0.53 


0.81 


44.3 


0.23 


0.22 


0.02 


PropertyS 


0.71 


1.04 


72.0 


0.37 


0.32 


0.058 


PropertyS 


0.53 


0.80 


44.8 


0.24 


0.17 


0.035 


Property? 


0.69 


0.96 


63.9 


0.33 


0.27 


0.039 


PropertyS 


0.70 


0.98 


64.2 


0.32 


0.27 


0.039 


Property9 


0.54 


0.85 


45.4 


0.21 


0.15 


0.035 



MDG or shallow embedding, the CPU time is lower compared to the time needed 
for verifying using our deep embedding of MDG in HOL. However, the latter 
remains reasonable especially when the timing is not the major performance 
measure. On the other hand, we notice the drastic memory usage reduction pro- 
vided by the deep combination of HOL and MDG compared to using pure MDG 
or the shallow embedding of MDG in HOL. This reduction can be explained by 
the fact that we are considering the graphs of each directed formula separately 
instead of working with the whole system to verify, leading to a better garbage 
collection. This advantage is crucial because it would enable the handling of 
larger designs. More details about the ITC models and properties specification 
and verification can be found in [14]. 

7 MDG as a Decision Procedure 

The multiway decision graphs are a canonical representation of the directed for- 
mulae. Two directed formulae are equivalent if and only if they are represented 
by the same graph for a fixed order. This property can be used to prove auto- 
matically the equivalence of HOL terms or to check that a formula is a tautology 
in case it is represented by the MDG true. 

7.1 Combinational Equivalence Checking 

We provide here a decision procedure that enables us to verify automatically the 
equivalence of a certain subset of first-order HOL terms. This is performed using 
the ML function EquivCheck. 

\~def EquivCheck order tl t2 = 

let si = termToMdg order tl 
s2 = termToMdg order t2 
in (sl=s2) 
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Using EquivCheck we write an oracle that builds a theorem stating the equiva- 
lence between terms. The theorem is not derived from axioms and inference rules, 
which will endanger the security provided by the HOL reasoning style. Theo- 
rems created using the oracle are tagged so that an error can be traced whenever 
it occurs. This kind of decision procedures are widely used to introduce some 
automation to the theorem provers. 

7.2 Tautology Checking 

A formula is a tautology if it is represented by the MDG T. This makes the 
check very easy for the subset we consider, which are the directed formulae. We 
use the ML function Tautology which we implemented in HOL. 

\~def Tautology order t = 

let s = termToMdg order t 
in (isTrue s) 

8 Conclusions and Future Work 

Expertise and user guidance is a major problem for applying theorem proving on 
even the most trivial systems. On the other hand, state exploration techniques 
suffer from the state space explosion problem, which limits their applications to 
industrial designs. An alternative to these techniques would be to combine the 
advantages of both in a hybrid approach that will lead to a hopefully, automatic 
or semi-automatic technique, which can handle large designs. In this paper, we 
proposed an approach that allows certain verification problems, specified in the 
HOL theorem prover, to be verified totally or in part using state-exploration 
algorithms. Our approach consists of an infrastructure of decision diagrams data 
structure and operators made available in HOL, which will allow the user to 
develop his/her own state-exploration algorithms in the HOL proof system. The 
data structure we considered in our work is the multiway decision graphs (MDG). 
MDG is an extension to the well-known binary decision diagrams (HDD) in that 
it eliminates the state explosion problem introduced by the datapath. 

The platform we provide allowed us to develop state-exploration algorithms 
inside HOL like reachability analysis, model checking and invariant checking pro- 
cedures. We also developed decision procedures based on the MDGs allowing the 
equivalence and tautology checking of a certain subset of HOL terms automat- 
ically. Finally, we demonstrated the feasibility of our approach by considering 
some case study examples, which we have been able to verify using a seamless 
interaction between HOL and MDG. 

The embedding of the MDGs in HOL opens the way to the development of 
a wide range of new verification applications combining the advantages of state- 
exploration techniques and theorem proving. There are many opportunities for 
further work on using this embedding for formal verification. For instance, MDG 
canonicity can be used in HOL for term simplification. In fact, when built, MDGs 
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are reduced by construction. Retrieving the term represented by this graph gives 
a simplification of the original one. The Embedding can be used for the formal 
proof of the soundness of the MDG algorithms extending the work in [21], 
where the correctness of the MDG system translators was proved, ensuring the 
correctness of the whole MDG system. A similar work was done in [3] to verify a 
SPIN model checking algorithm in HOL. Finally, the embedding can be enhanced 
by using the LCF style [9]. In this case, an MDG representation for a HOL term 
can only be derived using inference rules and trivial MDGs. The graph of a 
directed formula is derived from the graphs of its equations (trivial MDGs) and 
the MDG operators (inference rules). This restricts the scope of soundness to 
single operators which, are easier to get right [8] . 
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Abstract. Researchers in formal methods have emphasized the need to 
make specification analysis as automatic as possible and to provide an 
array of tools in a uniform setting. Athena is a new interactive proof 
system that supports specification, structured natural deduction proofs, 
and trusted tactics. It places heavy emphasis on automation, seamlessly 
incorporating off-the-shelf state-of-the-art tools for model generation and 
automated theorem proving. We use a case study of railroad safety to 
illustrate several aspects of Athena. A formal specification of a rail- 
road system is given in Athena’s multi-sorted first-order logic. Automatic 
model generation is used abductively to develop from scratch a policy for 
controlling the movement of trains on the tracks. The safety of the pol- 
icy is proved automatically. Finally, a structured high-level proof of the 
policy’s correctness is presented in Athena’s natural deduction calculus. 



1 Introduction 

Logic has been called “the Calculus of Computer Science” [9,20]: just as Calcu- 
lus and differential equations can be used to model the behavior of continuous 
physical systems, the language of mathematical logic can be used as a succinct 
and unambiguous notation for specifying the structure and behavior of discrete 
systems. Once we have obtained a logical specification of such a system in the 
form of a logical formula Pi A • • • A P„, we can begin to ask various questions: 

1. Is the specification consistent? That is, does the formula Pi A • • • A P„ have 
a model? 

2. Does the specified system have a desired property P? That is, does the 
specification Pi A • • • A P„ logically imply P? 

Two different techniques are used to answer these questions: model generation 
and theorem proving. Model generation can be used to answer the first question 
positively by exhibiting a model for Pi A • • • A P„. Theorem proving can be used 
to settle the second question positively by showing that the implication 



Pi A • • • A P„ =k P (1) 

is a tautology. On the flip side, we can use theorem proving to settle the first 
question negatively, by proving that the constant false follows logically from 
Pi A • • • A P„; and we can use model generation to settle the second question 

F. Wang (Ed.): ATVA 2004, LNCS 3299, pp. 294-309, 2004. 
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negatively, by exhibiting a model for Pi A • • • A A -■P, i.e., a countermodel 
for (1). In view of this natural synergy it would appear useful to have both of 
these techniques available in one uniform setting, and indeed several researchers 
have made such suggestions [22,10]. The system that we discuss in this paper, 
Athena, offers both. 

Once we have concluded that our system has a desired property P, we are 
left with the task of explaining why that is — why P follows from Pi A • • • A P„. 
One possible answer is: “Because our favorite theorem prover said so.” That 
may be an acceptable answer, depending on the context of the application. But 
it provides no insight into our system and has little explanatory value. A much 
better alternative is to adduce a formal proof that shows how P is derived from 
the specification. Such a proof should be mechanically checkable in order to 
ensure its correctness. But it must also be structured [19]: It should be given 
in a natural deduction format, in a precisely defined language with a formal 
semantics, and should be expressed at a level of abstraction roughly equivalent 
to that of a rigorous proof in English. This brings us to our third topic, which 
is a major component of formal methods in its own right: the subject of proof 
representation and checking. 

Athena [1] is a new system that integrates all three of these elements: model 
generation; automated theorem proving; and structured proof representation and 
checking. It also provides a higher-order functional programming language, and 
a proof abstraction mechanism for expressing arbitrarily complicated inference 
methods in a way that guarantees soundness, akin to the tactics and tacticals 
of LCF-style systems^ such as HOL [8] and Isabelle [24]. Proof automation is 
achieved in two ways: first, through user-formulated proof methods; and second, 
through the seamless integration of state-of-the-art ATPs (such as Vampire [30] 
and Spass [31]) as primitive black boxes for general reasoning. For model genera- 
tion, Athena integrates Paradox [6], a new highly efficient model finder. For proof 
representation and checking, Athena uses a block-structured Fitch-style natural 
deduction calculus [25] with novel syntactic constructs and a formal semantics 
based on the abstraction of assumption bases [2]. 

In this paper we will illustrate all these aspects of Athena with a case study. 
We will develop a policy for controlling the movement of trains in a railroad 
system and prove that the policy is sound, in the sense that it satisifies a certain 
notion of safety. The soundness of the policy is proved completely automatically 
by the off-the-shelf ATPs that Athena uses under the hood. ATP technology 
has made impressive strides over the last few years, and the systems that are 
bundled with Athena — especially Vampire, the winner of the last few CASC 
competitions — -are now remarkably efficient. To give some perspective, it took 
a senior CS student at MIT several solid hours of work to prove the main result 
of this case study; it took Vampire a fraction of a second. 



^ An important difference of such systems from Athena is that they are based on 
seqnent calculi. By contrast, Athena uses a Fitch-style formulation of natnral de- 
duction [25], which helps to make proofs and proof algorithms more perspicuous. 
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Beyond the completely automatic verification of the policy’s soundness, we 
also provide a structured proof for it in Athena’s natural deduction framework, 
which is then successfully machine-checked (also in less than one second) . 

Moreover, we show that model generation is useful not only for consistency 
checking and for debugging our specifications, but also for building them. In 
particular, we demonstrate an aggressive use of model generation that performs 
abduction in a way that helps not only to debug a safety policy, but to build it 
in the first place. 

In logical deduction, reasoning proceeds from the premises to the conclusion: 
We take a given number of propositions as premises and attempt to derive some 
desired conclusion from them. During system design, however, we are often faced 
with the problem in the reverse direction: We know the desired conclusion, but 
we are not sure what constraints would be required in order to ensure it. That is, 
we have a skeleton system description Pi A • • • A P„ ready; and we have a desired 
property P. What we wish to know is what additional constraints Qi, . . . , Qm 
are necessary in order to guarantee P, i.e., such that 

{Pi, . . . ,P„} U {Qi, . . . , Qm} h P- 

That is the problem of abduction [16,15], which proceeds in the reverse direction 
from deduction. 

The following is a simple iterative procedure for this problem: 

1. Set C = {true}. 

2. Try to prove {Pi, . . . , P„| U C ^ P; if successful, halt and output C. 

3. If unsuccessful, try to find a model for {Pi, . . . , P„| U C U {“■Pj. 

4. If successful, use the information conveyed by that model to refine C appro- 
priately and then loop back to step 2; if unsuccessful, fail. 

We illustrate this algorithm in Section 3. The individual steps of the algorithm 
are semi-mechanical, as the corresponding problems are unsolvable; but, with 
the aid of highly efficient tools, steps 2 and 3 can be greatly automated. The 
fourth step is the one requiring the most creativity, but the minimality of the 
countermodels produced in step 3 is very useful here: on every iteration through 
the loop, the simplest possible countermodel is produced, and this greatly fa- 
cilitates the conjecture of a general condition that rules out the countermodel. 
After a few iterations of successive refinement, we will eventually converge to an 
appropriate theory. 



2 Specification of an Abstract Railroad Model 

Our railroad model is based on an Alloy [13] case study by Daniel Jackson [12], 
which was in turn inspired by a presentation on modeling San Fransisco’s BART 
railway by Emmanuel Letier and Axel val Lamsweerde at a meeting of IFIP 
Working Group 2.9 on Requirements Engineering in Switzerland, in February 
2000 . 
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We view a railroad abstractly by positing the existence of two domains Train 
and Segment. That is, we assume we have a collection of trains and a collection 
of track segments on the ground. Every segment has a beginning and an end, 
and motion on it proceeds in one direction only, from the beginning towards 
the end. Therefore, segments are unidirectional. Of course trains may move in 
opposite directions on different segments; but on any given segment trains move 
in one direction only. At the end of each segment there is a gate, which may be 
either open or closed. Gates will be used to control train motion. 



2.1 Railroad Topology 

Segments can be connected to one another, with the end of one segment attached 
to the beginning of another, and it is this connectivity that creates an organized 
railroad out of a collection of segments. We will capture this connectivity with 
a binary relation succ C Segment x Segment. The intended meaning is simple: 
succ(si, S2) holds iff S2 is a “successor” of si, i.e., iff the end of si is connected 
to the beginning of S2- A segment might have several successors. In general, 
multiple segments might end at the same junction and fork off into multiple 
successor segments. We stipulate that succ is irreffexive, so that no segment 
loops back into itself, and intransitive, which is an obvious physical constraint. 

Two segments may overlap, meaning that there is some piece of track, 
however small, that is shared by both segments. Segments that cross, for in- 
stance, will be considered overlapping. We model this with a binary relation 
overlaps C Segment x Segment. We will make two useful assumptions about 
this relation, reffexivity and symmetry. Clearly, both assumptions are consistent 
with the intended physical interpretation of overlaps. We thus have four axioms 
so far: 



V s . ->succ(s, s) ( 2 ) 

V si, S2, S3 . succ(si, S2) A SUCC(S2, S3) ^ -'succ(si, S3) ( 3 ) 

V s . overlaps(s, s) ( 4 ) 

V Si,S2 . overlaps(si, S2) =A overlaps(s2, Si) ( 5 ) 



2.2 Capturing the State of the System 

How do we capture a configuration of the railroad system at a given point in 
time? In order to know the state of the system we need to know two things. 
First, the distribution of the trains on the segments. That is, for each train t 
we need to know what segment t is on. And second, for each segment, we need 
to know whether its gate is open or closed. For our purposes, the state will be 
completely determined by these two pieces of information. Accordingly, we posit 
a domain State, a function 



segOf : Train x State— >■ Segment 
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and a relation closed C Segment x State. The interpretations are as stated 
above: segOf(t,a;) denotes the segment on which t is located in state x] and 
closed(s,a;) holds iff the gate of segment s is closed in state x. 

It is useful to introduce an auxiliary relation occupied C Segment x State 
such that occupied(s, x) holds iff segment s is “occupied” in state x. We define 
this explicitly as follows: V s,x . occupied(s, x) [3 t . segOf(t,x) = s]. 

We will model train motion as a transition relation between states: 

reachable C State x State 

The idea is that reachable (x, y) (“state y is reachable from state x”) iff y is 
identical to x except that some (possibly none) trains have moved to successor 
segments — provided of course that they could make such a move. Specifically: 

(V X, y) reachable (x, y) 4^ 

[(V t) segOf (t, y) yf segOf (t, x) succ(segOf (t, x), segOf (t, y)) A 

-'Closed(seg0f (t, x))] 

That is, in going from state x to y, a train t either didn’t move at all or else it 
had an open gate in state x and moved to a successor segment. 

This relational formulation is highly non-deterministic and allows for any 
physically possible transition from one state to another,^ including cases where 
only one train moves, where none do, where two or three of them do, etc. This 
non-determinism is desirable, since we want our model to cover as many scenarios 
as possible. 



2.3 Safety 

We will consider a state safe iff no two trains are on overlapping segments: 

V X . safe(x) [V ti,t2 • ti yf ^2 -ioverlaps(segOf (fi, x), segOf (^2, x))] 



We can now ask what would be an appropriate policy for controlling train motion 
that would guarantee this safety criterion. We make this more precise as follows: 
We define a sound safety policy as any number of unary constraints on states 
Cl , . . . , C„ such that for all states x and y, if 

1. X is safe; 

2. X satisfies the constraints Ci, . . . , Cn, i.e., Ci(x), . . . , C„(x) hold; and 

3. y is reachable from x 

then y is also safe. The problem now is to come up with state constraints that 
constitute a sound safety policy in this sense. 

^ Modulo our simplifying assumptions, most notably, our assumption that moving 
from one segment to another is instantaneous. 
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3 Abduction via Model Generation 

Initially we may well be at a loss in guessing what constraints might be appro- 
priate. We will show how an efficient model finder can provide insight on how 
to proceed. 

Let us start out with the most trivial state constraint possible: the constant 
true. With this policy, our safety statement becomes: 

\/ x,y . [saf e(a;) A reachable(x, y) A true] saf e(j/) 

Athena uses a prefix s-expression syntax, so we can define this proposition as 
follows: 

(define policy-safety 
(forall ?x ?y 

(if (and (safe ?x) 

(reachable ?x ?y) 
true) 

(safe ?y)))) 

Predictably, this statement isn’t true, and when we try to prove it automat- 
ically by issuing the method call (Iprove policy-safety), we fail. 

Now let us see why this does not hold. We will try to find a countermodel 
for this statement, and the details of that model will spell out why this trivial 
policy fails. Armed with that information, we can start developing a policy in 
increments by fixing the problems that are discovered by the model finder. 

We start by issuing the following command: 

(falsify policy-safety) (6) 

This command attempts to find a model for the collection of all the propositions 
in the current assumption base plus the negation of policy-safety. Within a few 
seconds, Athena informs us that a countermodel has been found, that is, a model 
in which all the propositions in the assumption base are true, but policy-safety 
is false. Athena displays the model by enumerating the elements of each sort and 
listing the extension of every function and predicate. In particular, command (6) 
results in the output shown in Figure 1. 

The countermodel consists of two states, state-1 and state-2. The second 
state is reachable from the first; and while the first state is safe, the second is 
not. Therefore, policy-safety is false in this model. The reason for the failure 
becomes evident when we inspect the details of the model. There are two seg- 
ments, each of which is a successor of the other, and two trains. In state-1, 
train-1 is On segment-l and train-2 on segment-2, and the gate of segment-1 
is open. Consequently, train-1 is free to move on to segment-2, and indeed in 
state-2 we have both trains on the second segment — a violation of our safety 
notion. Graphically, the situation is depicted in Figure 2. We use small rectan- 
gular boxes to represent trains. An open (closed) gate is indicated by the symbol 
(respectively, x). 
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The issue is this: when a successor of a segment s is occupied, then s ought 
to have a closed gate. This is clearly violated in the countermodel, and that is 
how the unsafe second state is obtained. Therefore, we formulate our first state 
constraint as follows: 



<71(2;) V Si, §2 • [succ(si, S2) A occupied(s2, a;) closed(si, x)] 

for arbitrary x. Accordingly, we redefine policy-safety to be the following propo- 
sition: \/ x,y . [safe(x) A reachable(a;, y) A Ci{x)] ^ safe(y). 

When we try to prove this automatically, we fail, so we revert to the model 
finder. Issuing the command (falsify policy-safety) results in the counter- 
model shown in Figure 3 (we omit Athena’s textual presentation of the model 
for space reasons). Once again, there are two states, where the first one is safe 
while the second one is reachable from the first but unsafe. There are three 
segments, Si,S2, and S3, where succ(si,S2), succ(s2,S3), and succ(s3,si). Fur- 
ther, Si overlaps both S2 and S3, while S2 and S3 do not overlap. And there are 
two trains, ti and t2- The first state is as depicted in the left half of Figure 3 ; 
namely, ti is on S2, which has a closed gate, while ^2 is on S3, whose gate is open. 
(Segment si has a closed gate in this state, although that is immaterial since 

51 is not occupied in this state.) Note, in addition, that this state satisfies our 
constraint Ci. There is only one segment with an open gate, S3. And Ci allows 
S3 to have its gate open because S3 does not have any occupied successors. The 
only successor of S3 is si, and there are no trains on si in this state. 

Now the unsafe second state, shown in the right half of Figure 3 , is obtained 
from the first state when t2 moves from S3 to si. This is permissible because 
S3 has an open gate in the first state. But the new state is unsafe because even 
though Si has only one train on it, it nevertheless overlaps with S2, which is 
occupied by t\. This violates our notion of safety, which prescribes a state safe 
iff there are no overlapping segments occupied by distinct trains. Since si and 

52 are overlapping and occupied by distinct trains in the new state, the latter is 
unsafe. 



succ ( segment- 1 , segment-1) - false 
succ(segment-l , segment-2) - true 
succ(segment-2, segment-1) - true 
succ (segment-2, segment-2) - false 



overlaps(segment-l , segment-1) = true 
overlaps(segment-l , segment-2) = false 
overlaps(segment-2, segment-1) = false 
overlaps(segment-2, segment-2) = true 



safe(state-l) = true 
safe(state-2) = false 



segOf (train-1 , state-1) = segment-1 
segOf (train-1 , state-2) = segment-2 
segOf (train-2 , state-1) = segment-2 
segOf (train-2, state-2) = segment-2 



closed(segment-l , state-1) = false 
closed(segment-l , state-2) = true 
closed(segment-2 , state-1) = false 
closed(segment-2 , state-2) = false 



reachableFrom (state-1 , state-1) = true 
reachableFrom(state-l, state-2) = true 
reachableFrom (state-2, state-1) = true 
reachableFrom (state-2, state-2) = true 



Fig. 1. Athena output displaying a countermodel. 
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state-1 (safe) state-2 (unsafe) 

Fig. 2. A countermodel falsifying the trivial safety constraint true. 



Thus we see that our initial constraint C\ does not go far enough. It is not 
enough to stipulate that a predecessor of an occupied segment must have a 
closed gate; we must stipulate that a predecessor of a segment that overlaps with 
an occupied segment must have its gate closed. This is a stronger condition. It 
implies Ci, owing to our assumption that the overlaps relation is reflexive. So 
we introduce a new constraint C[: 

(ai) 4 ^Vsi,S 2 ,S 3 . [succ(si , S 2 ) A overlaps (s 2 ; ^ 3 ) A occupied(s 3 , x) ^ closed(si , tc)] 



and redefine policy-safety to be the proposition 

'i x,y . [saf e(cc) A reachable(a;, y) A C'i{x)] saf e(?/) 



Unfortunately, when we attempt to prove this latest version automatically, 
we fail again. Returning to the model finder, we attempt to falsify this statement, 
which succeeds via the countermodel shown in Figure 4. As the picture makes 
clear, the problem is that two trains were able to move to the same segment 
simultaneously, because two distinct predecessors of the segment had open gates 
at the same time. To disallow this, we formulate the following constraint: 

V X . C2{x) [V si, S2 . Si yf S2 A (3 s . succ(si, s) A succ(s2, s)) A 
->closed(si, a;)] closed(s2, x) 



This guarantees that, in any state, if two distinct segments have the same suc- 
cessor and one of them has an open gate, then the other will have a closed gate. 
This is an adaptation of the traffic rule which says that an intersection should 
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Fig. 3. A countermodel falsifying constraint Ci. 
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Fig. 4. A countermodel falsifying constraint C[. 



S3 




X I ti,t2 move 




Fig. 5. A countermodel falsifying constraint C2. 



not show a green light in two different directions. We now redefine policy-safety 
as follows: 

\f x,y . [saf e(a;) A reachable(a;, y) A C[{x) A C2{x)] saf e(j/) 

But this version is not valid either. Attempting to falsify it results in the 
countermodel shown in Figure 5. The problem is essentially a generalization of 
the situation depicted by the countermodel in Figure 4. This time, ti and t2 do 
not move to the same segment, but to overlapping ones. This is possible because 
the segments on which ti and <2 are placed initially (namely, S2 and S4) have 
overlapping successors and yet both of them have open gates at the same time. 
We need to prohibit this. Let us say that two distinct segments are joinable 
iff they have overlapping successors. That is, for all si and S2, joinable(si, S2) 
holds iff 



Si yf S2 A [3 s'l, s '2 ■ overlaps(s'i, S2) A succ(si, s'l) A succ(s2, s^] 

We then need to stipulate that of any two joinable segments, at most one has 
an open gate. We express this via a new constraint C'2 as follows: 

C'2{x) 44 [joinable(si, S2) A -iclosed(si, x) closed(s2, x)] 

Observe that C'2 implies C2 (since overlaps is reflexive), hence it is no longer 
necessary to state C2. Therefore, our safety statement now becomes: 
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(define policy-safety 
(forall ?x ?y 

(if (and (safe ?x) 

(reachable ?x ?y) 

(Cl’ ?x) 

(C2> ?x)) 

(safe ?y)))) 

This time, the attempt (! prove policy-safety) succeeds instantly, confirming 
that we have finally arrived at a sound safety policy. 



4 Structured Proof Representation 

We have automatically verified the soundness of our safety policy, and while 
that should boost our confidence in the policy, it is not quite good enough. As 
engineers, we should be able to convince others that our policy is indeed sound — 
we should be able to justify our policy with a solid argument. That justification 
should take the form of a rigorous mathematical proof. However, not just any 
proof will do. A formal proof of an important system property should serve as 
human-readable documentation: it should explain why the property holds. To 
that end, the proof should be structured, given in a natural deduction style 
resembling common mathematical reasoning, and at a high level of abstraction. 

Athena proofs are expressed in a block-structured (“Fitch-style” [25]) natural 
deduction calculus. High-level reasoning idioms that are frequently encountered 
in mathematical practice are directly available to the user, have a simple seman- 
tics, and help to make proofs readable and writable. Athena’s off-the-shelf ATP 
technology can be used to automatically dispense with tedious steps, focusing 
instead on the interesting parts of the argument and keeping the reasoning at a 
high level of detail. 

Most interestingly, a block-structured natural deduction format is used not 
only for writing proofs, but also for writing tactics (“methods” in Athena ter- 
minology). This is a novel feature of Athena; all other tactic languages we are 
aware of are based on sequent calculi. Tactics in this style are considerably easier 
to write and remarkably useful in making proofs more modular and abstract . As 
this example will illustrate, writing methods can pay dividends even in simple 
proofs. 

In what follows we present a formal Athena proof of the safety of our policy. 
As our starting point, and for purposes of comparison, consider first a rigorous 
proof of the result in English: 

Theorem 1. For all states x and y, if (1) x is safe; (2) y is reachable from x; 
and (3) X satisfies constraints and C' 2 ; then y is also safe. 

Proof. Pick arbitrary states x and y and assume that x is safe; that y is reachable 
from X] and that C'i{x) and C' 2 {x) hold. Under these assumptions, we are to prove 
that y is safe. 
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We will proceed by contradiction. Suppose, in particular, that y is not safe. 
Then, by the definition of safety, there must be two distinct trains t\ and t2 on 
overlapping segments in y, that is, we must have t\ ^ t2 and 

overlaps(segOf (ti, y), segOf (t2, y)) (7) 

We now ask: did either train move in the transition from state x to y, or did 
both stay on the same segment? Exactly one of these two possibilities must be 
the case, i.e., we must have either 

casei = [seg0f(ti,?/) ^ segOf (ti, a;)] V [segOf (^2, y) ^ segOf (f2, x)] 

(ti moved or t2 moved); or else: 

cas62 = [seg0f(ti,?/) = segOf (ti, a:)] A [segOf (^2, y) = segOf (^2, a;)] 

(neither one moved). The disjunction casei V case2 holds by the law of the ex- 
cluded middle. We will now show that a contradiction ensues in either case. 



Consider casc2 first, i.e., assume 




seg0f(ti,y) = seg0f(ti,a;) 


(8) 


seg0f(t2,y) = seg0f(f2,x) 


(9) 


Then, from (8), (9), and (7), we conclude 




overlaps(segOf (ti, x), segOf (f2, a^)) 


(10) 



i.e., that the segments of ti and t2 in state x overlap. But ti and t2 are distinct 
trains, so that would mean that state x is unsafe: that it has two distinct trains on 
overlapping segments. This is a contradiction, since we have explicitly assumed 
that X is safe. 

Consider now casci, where at least one of the trains has moved in the tran- 
sition from X to y. Without loss of generality, assume that ti moved, so that 

seg0f(ti,i/) 7^ seg0f(ti,x) (11) 

From this, along with the hypothesis that y is reachable from x and the definition 
of reachability, we conclude that the segment of ti in j/ is a successor of the 
segment of t\ in x] and that the segment of t\ in x had an open gate: 

succ(segOf (ti, x), segOf (ti, y)) (12) 

->closed(segOf (ti, x), a;) (13) 

We now perform a case analysis depending on whether or not t2 moved as 
well. Suppose first that, like t\, t2 also moved, so that: 



seg0f(t2,i/) ^ seg0f(t2,a;) 



(14) 
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As before, this entails (in tandem with the reachability of y from x) the following: 

succ(segOf (t 2 , a:), segOf (^ 2 , y)) (15) 

->closed(segOf (^ 2 , a;), a;) (16) 

But this means that the segments seg0f(ti,a;) and seg0f(t2,a;) are joinable: (a) 
they are distinct (if they were identical, then x would be unsafe, since t\ ^ and 
overlaps is reflexive, contrary to our assumption); and (b) they have overlapping 
successors (from (7), (12), and (15)). Therefore, state x has two joinable segments 
with simultaneously open gates — a condition that is explicitly prohibited by C' 2 , 
which is supposedly observed in x. Hence a contradiction. 

By contrast, suppose that ^2 did not move during this state transition: 

seg0f(t2,y) = seg0f(t2,a;) (17) 



In that case, (7) entails 

overlaps(segOf (ti, y), segOf (t 2 , a;)) (18) 

This case violates constraint C[ in state x: segOf (ti, cc), the segment of ti in x, 
is the predecessor of a segment that overlaps with an occupied segment, namely, 
segOf (^ 2 , a;). Therefore, according to C[, it should have its gate closed — but it 
does not, a contradiction. 

This concludes the case analysis of whether t 2 moved, on the assumption 
that ti has moved. A symmetric argument can be given on the assumption that 
t 2 has moved. □ 

This is a perfectly rigorous proof, with one exception: the phrase “without 
any loss of generality” is vague. Nevertheless, it is a frequent mathematical 
colloquialism. Typically, it means that there is a finite number of cases to consider 
Cl, . . . , c„, and it does not really make a difference which ci we analyze because 
the reasoning for one of them can be readily applied to the others. This is 
reiterated in the closing remark that “a symmetric argument can be given on 
the assumption that ^2 moved.” 

These colloquialisms can be given more precise meaning with the help of 
algorithmic notions. What we really are saying above is that any proof for a 
particular Ci can be abstracted (over a number of appropriate parameters) into 
a general proof algorithm that can be just as well applied to the other cases. 
That is, we are claiming that there is a tactic that will produce the desired 
conclusion in any given case. In the Athena proof of the safety result, shown in 
Figure 6, we formulate such a method M that is capable of performing the correct 
analysis on a variable input assumption of which train has moved first. Treating 
both cases then becomes simply a matter of invoking ( !M tl t2) first and then 
transposing the arguments and invoking (!M t2 tl) for the second case. 

The technical details of the Athena proof are not so important; the interested 
reader could follow them by consulting a description of the formal syntax and 
semantics of the language [4]. The important points are: (a) the Athena proof 
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(policy-safety BY 
(pick-any x y 

(assume-let {{hyp {and (safe x) 

(reachable 
(CIA x) 
(C2A x)))) 



y) 



{ ! by-contradiction 

(assume (not (safe y)) 

(diet ({P ((derive (exists ?tl ?t2 

(and (not (= ?tl ?t2)) 

(overlaps (segOf ?tl y) (segOf ?t2 y)))) 

[(not (safe y)) safe-definition]))) 

(pick-witnesses (tl t2) P 

(diet ({t-property (and (not (= tl t2)) 

(overlaps (segOf tl y) (segOf t2 y)))) 

(t-distinct ((derive (not (= tl t2)) [t-property])) 

(t-overlapping ((derive (overlaps (segOf tl y) (segOf t2 y)) [t-property])) 
(one-has-moved ((derive (or (not (= (segOf tl y) (segOf tl x))) 

(not (= (segOf t2 y) (segOf t2 x)))) 

[hyp safe-definition t-property])) 

(M (method (rl r2) 

(assume-let ((hypl (not (= (segOf rl y) (segOf rl x))))) 

(diet ((PI ((derive (succ (segOf rl x) (segOf rl y)) 

[hypl reachable-definition hyp])) 

(P2 ((derive (not (closed (segOf rl x) x)) 

[hypl reachable-definition hyp])) 

(cl (assume-let ((easel (not (= (segOf r2 y) (segOf r2 x))))) 
(diet ((P3 ((derive (succ (segOf r2 x) (segOf r2 y)) 

[easel reachable-definition hyp])) 

(P4 {(derive (not (closed (segOf r2 x) x)) 

[easel reachable-definition hyp])) 

(PS {(derive (not (= (segOf rl x) (segOf r2 x))) 

[hyp t-distinct safe-definition 
(reflexive overlaps)])) 

(P6 {(derive (joinable (segOf rl x) (segOf r2 x)) 

[P3 PI t-overlapping PS 
joinable-def inition 
(symmetric overlaps)]))) 

((derive false [C2’-definition P2 P4 P6 hyp])))) 

(c2 (assume-let ((case2 {= (segOf r2 y) (segOf r2 x)))) 

(diet ((P7 ((derive (occupied (segOf r2 x) x) 

[case2 occupied-definition])) 

(P8 {(derive (overlaps (segOf rl y) (segOf r2 x) ) 
[case2 t-overlapping 
(symmetric overlaps)]))) 

((derive false [P7 P8 P2 hyp PI Cl’-definition 
(symmetric overlaps)]))))) 

((by-cases cl c2 []))))) 

(say-tl-moves ((if (not (= (segOf tl y) (segOf tl x))) false) BY (!M tl t2))) 
(say-t2-moves ((if (not (= (segOf t2 y) (segOf t2 x))) false) BY (!M t2 tl)))) 
(!by-cases say-tl-moves say-t2-moves [one-has-moved]))))))))) 



Fig. 6. Athena proof of safety 



reads more or less like the English proof, in that the overall structure of both 
proofs as well as the granularity of the individual inferences are similar; and (b) 
tactics in block-structured natural deduction style are easy to write and useful 
even in simple proofs. 

5 Related Work 

Alloy [13] is also aimed at automatic analysis of abstract system designs (mainly 
software systems). Alloy’s specification language is based on Tarski’s relational 
calculus, with heavy influences from Z [27]. It has been successfully used to 
analyze several systems, e.g. exposing bugs in Microsoft’s COM [14] and in a 
naming architecture for dynamic networks [18]. The case study we presented in 
this paper was originally done in Alloy [12]. The main difference is that Alloy does 
not have a notion of proof, and hence it cannot be used to verify infinite-state 
systems. It can detect the presence of bugs but cannot establish their absence. 
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Prioni [3] is a tool that attempts to bridge this gap by integrating Alloy with 
Athena. Athena proofs in Prioni are based on an explicit first-order axiomati- 
zation of the calculus of relations and a lemma library containing various useful 
results for that calculus (e.g., that the transpose and reflexive transitive closure 
of a homogeneous binary relation commute). Essentially, the Alloy calculus of 
relations is used as an object language and Athena is used as a metalanguage to 
manipulate and reason about the object language. This indirection can compli- 
cate the reasoning required for the proof effort. Automation is also hindered in 
Prioni. For instance, the completely automatic soundness proof that we obtained 
for this case study would be highly unlikely in Prioni because the assumption 
base would be overpopulated with the entire axiomatization of the relational 
calculus and the lemma library. It is well known that ATPs get overwhelmed by 
large sets of premises [26]. Finally, Prioni’s integration of Alloy and Athena is 
not seamless in the sense that the user must be fluent in both systems in order 
to use the tool. By contrast, in the approach we illustrated in this paper, the 
integration of Paradox into Athena is completely seamless; the user has no idea 
that Paradox is running under the hood. 

This opaque use of Paradox is enabled by an automatic translation from 
Athena’s multi-sorted logic to the standard single-sorted TPTP input format 
[28] of Paradox and back.^ The translation is written in Athena itself, leveraging 
its facilities for manipulating propositions. Once a model has been produced, an- 
other translation takes place that transforms the Paradox output into a format 
that makes sense to the Athena user (as shown in Figure 1). Similar remarks 
can be made about Athena’s use of ATPs. To take a simple example. Vampire 
has a fairly limited lexical notion of variables (e.g., they must begin with cer- 
tain letters), whereas Athena is more liberal (e.g., names such as ?*d2-E3 are 
legal). Incompatibilities of this kind are resolved silently during the automatic 
translation. This approach pays heed to the lessons that have emerged from suc- 
cessful case studies, which stress that “evidence of the ATP must be hidden,” 
and that due attention must be given to pragmatic issues, e.g., implementation 
restrictions such as reserved identifiers, length of symbols, etc. — issues that are 
“often overlooked in research-oriented environments” although “they are impor- 
tant prerequisites for successful applications of ATPs” [26]. 

Some ATPs such as Gandalf [29] and Spass [31] have model-building capabili- 
ties in addition to standard resolution-based refutation. However, direct applica- 
tion of such a batch-oriented ATP is challenging because of its “low bandwidth 
of interaction,” which renders it “a toolkit with only one single tool” [7]. As 
Schumman puts it, such ATPs are like racing cars — they are fast and power- 
ful but cannot be used in everyday traffic because essentials such as headlights 
are missing [26]. The same can be said about Paradox. By contrast, interactive 
proof environments (such as Isabelle, PVS, Athena, etc.) provide the essentials 
as well as bells and whistles: rich specification languages, definitional facilities for 

® The standard embedding of many-sorted logic into single-sorted logic [21] is used in 
going from Athena to Paradox; this is safe for our purposes, since the embedding is 
well-known to preserve finite models. 
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incremental extension of proof scripts, abstraction mechanisms, computational 
capabilities, tactics, etc. It therefore makes more sense to harness the power of 
ATPs from within such environments rather than as stand-alone applications. 

Theorem proving has been combined with model checking in systems such 
as PVS [23], ACL2 [17], SLAM [5], and others (SLAM’s “counterexample-driven 
refinement” is also somewhat similar to our notion of incremental abduction via 
countermodel generation). Model checking is different from model generation — 
it checks whether a formula holds in a given model, so there are two possible 
answers: “valid” or “invalid,” with the latter accompanied by an offending trace. 
The correctness of the model-checking algorithm implementation is crucial for 
the credibility of “valid” answers, as most model checkers do not emit proofs that 
can be independently checked (unlike ATPs such as Vampire and Spass, which 
do emit proofs). There have also been attempts to integrate some higher-order 
proof systems with first-order ATPs, as in the integration of Gandalf and HOL 
[11]. These are not entirely happy marriages, however, since some higher-order 
goals cannot be desugared into first-order logic; indeed, this remains an active 
research field. That is not an issue for Athena, which is a first-order system. 
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Abstract. The aim of this work is the modeling and verification of con- 
current systems that are subject to dynamic changes by using extensions 
of Petri nets. In previous studies, we have introduced net rewriting sys- 
tems and a subclass of these called reconfigurable nets. In a net rewriting 
system, a system configuration is described as a Petri net and a change 
in configuration is described as a graph rewriting rule. A reconfigurable 
net is a net rewriting system where a change in configuration amounts 
to a modification in the flow relations of the places in the domain of 
the involved rule in accordance with this rule, independently of the con- 
text in which this rewriting applies. In both models, the enabling of 
a rule depends only on the net topology. Here we introduce marked- 
controlled net rewriting systems and marked-controlled reconfigurable 
nets where the enabling of a rule also depends on the net marking. We 
show an implementation of marked-controlled reconfigurable nets with 
Petri nets. Even though the expressiveness of both models is the same, 
with marked-controlled reconfigurable nets, we can easily and directly 
model systems that change their structure dynamically. It may be more 
efficient to directly implement the methods of verification of properties 
of Petri nets on the marked-controlled reconfigurable nets model. 



1 Introduction 

A Petri net [15,16] is a formalism used to model, analyze, simulate, control and 
evaluate the behavior of distributed and concurrent systems. This formalism, 
however, does not offer a direct way to address modeling issues such as dynamic 
changes, multiple operating modes of operations, etc. Extensions of Petri nets 
have been designed to allow for an easy formalization of such features. The ben- 
efit in terms of modeling power is usually at the expense of a loss in decidable 
properties [7]. A trade-off needs to be found between expressiveness and com- 
putability. The fundamental goal of this work is the modeling, simulation and 
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verification of concurrent systems that are subject to dynamic changes. For ex- 
ample, to model a system of printers in which we want to choose dynamically 
between the printing of several copies of the same job on different printers or 
sequentially printing the copies on the same printer, the solution using marked- 
controlled net rewriting systems is very simple (see Fig. 1 and Fig. 2). It is 
important that the mechanism for handling dynamic changes in such systems 
be explicitly represented inside the model so that at each stage of product de- 
velopment, designers can experiment with the effects of structural changes (e.g., 
by using prototypes) . This means that structural changes are taken into account 
from the very beginning of the design process rather than handled by an exter- 
nal, global system (e.g., by some exception handling mechanism), designed and 
added to the model describing the system’s normal behavior. Thus, we are in 
favor of an internal and incremental description of changes over an external and 
uniform one, and a local handling of changes over a global one. 

The model of net rewriting systems introduced in [3,11,12,13] arises from two 
different lines of research. Both were conducted in the field of the Petri net for- 
malism with the goal of enhancing the expressiveness of the basic model of Petri 
nets so that it can support the description of concurrent systems that are sub- 
ject to dynamic changes. The first class of models covers various proposals for 
merging Petri nets with graph grammars [4,6,17] while the second class, which is 
best represented by Valk’s self -modifying nets [18,19], considers Petri nets whose 
flow relations can vary at runtime. Both proposals lead to expressive models that 
have definite benefits with respect to modeling issues. However, most of the ba- 
sic decidable properties of Petri nets (place boundedness, reachability, deadlock 
and liveness) are lost for these extended models. Therefore, no automatic ver- 
ification tools can be implemented for these models. Reconfigurable nets which 
were introduced in [3,11,12,13] as a particular subclass of net rewriting systems, 
attempt to combine the most relevant aspects of both of these approaches and 
constitute a class of models for which each of the preceding fundamental proper- 
ties are decidable. The translation of this model into Petri nets is automatic [11, 
12,13]. This equivalence ensures that all the fundamental properties of Petri nets 
are still decidable for reconfigurable nets, and therefore, this model is amenable 
to automatic verification tools. In contrast, the class of net rewriting systems is 
Turing powerful [11,12,13], and thus automatic verification is no longer possible 
for this larger class. 

In net rewriting systems, a system configuration is described as a Petri net 
and a change in configuration is described as a graph rewriting rule which con- 
sists of replacing part of the system (the part that matches the left-hand side 
of the rewriting rule) with another one (given by the right-hand side of the 
rewriting rule) . A reconfigurable net is a net rewriting system where a change in 
configuration is limited to the modification of the flow relations of the places in 
the domain of the rewriting rule involved. In other words, for a rewriting rule 
to be enabled, so that a change in configuration can take place, the net topol- 
ogy is the only thing to be taken into account. However, most of the changes 
in configuration that occur in real systems depend on the state of the system 
(represented in a net by its marking). It would therefore be interesting for the 
enabling of a rewriting rule not only to depend on the net topology but to also 
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depend on the net marking. Basically, the idea is to add a control mechanism 
to net rewriting systems and to reconfigurable nets so that the net only makes 
changes in configuration when a certain minimal marking is reached. 

Section 2 introduces marked-controlled net rewriting systems. In Section 3, 
we present the definition of marked-controlled reconfigurable nets and a detailed 
example. An implementation of marked-controlled reconfigurable nets with Petri 
nets is shown in Section 4. Finally, we describe some related works and we present 
our conclusions in Section 5. 



2 Marked-Controlled Net Rewriting Systems 

This section introduces the model of marked-controlled net rewriting systems. 
First, we establish some notations. 

If i? C A X y is a binary relation, we let X'R = {y € Y \ 3x € X' (x, y) G i?} 
denote the image of X' C X, and RY' = {x G A | G F' (x, y) G R} denote 
the inverse image of Y' C Y . The domain of R is then Dom{R) = RY and the 
codomain of R is Cod{R) = XR. 

A Petri net [15,16] is a tuple P = (P,T,F) where: P = {pi,p 2 , ■ ■ ■ ,Pm} is a 
finite set of places, T = {ti, ^ 2 , • ■ • , tn} is a finite set of transitions {P C\T = 0, 
PUT yf 0) and F : (P x T) U (T x P) — >• {0, 1, 2, 3, . . . } is a set of weighted arcs 
(flow relation). A marked Petri net is a pair (P, Mq) where P is a Petri net and 
Mo : P — >■ {0, 1, 2, 3, . . . } is the initial marking. 

Let P = (P, P, P) and P' = (P',P',P') be two Petri nets. We call P and 
P' isomorphic if there exists a bijection : (P U P) — >■ (P' U P') such that 
P(x, y) = F' {(p{x) , (f{y)) for all x, y G P U P. 

A full embedding of a Petri net P = (P, P, P) into a Petri net P' = (P', P', P') 
is an injective map f : PUP — >■ P'UP' that maps places to places and transitions 
to transitions (/(P) C P' and /(P) C P') such that for any pair of elements 
x,y G PUP, F{x,y) = F'{f{x),f{y)). The image of P by / is then called a full 
subnet of F' . 

Definition 1. A marked-controlled net rewriting system is a structure N = 
(P, (Pq, Mo)) where TZ = {xi,... ,rh} is a finite set of rewriting rules and 
{Fq,Mo) is a marked Petri net. 

A rewriting rule r G TZ is a structure r = (P, P, t,*t, t*, C, M) where: 

1. L = {Pl,T]^,Fl) and R = {PR,Tpi,Ffj) are Petri nets called the left-hand 
side and the right-hand side of r, respectively; 

2. T C (P£,xPr)U(PlxPr), called the transfer relation ofr, is a binary relation 
relating places of L to places of R and transitions of L to transitions of R: 
Plt C Pr, tPr C Pi, Trt C Tr and tTr C Tr; 

3. *T C T, and t* Ft are sub-relations of the transfer relation called the input 
interface relation and the output interface relation, respectively; 

4-. C is a finite set of places, called control places, which is a subset of places 
0fL,CCPR; 

5. M is the minimum marking of places ofC required so that the rule is enabled. 
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A configuration of a marked- controlled net rewriting system N is a Petri net 

r = {p,T,F). 

A state of a marked- controlled net rewriting system N is a marked Petri net 
{P,M). The pair {Pq,Mq) is called the initial state of the marked- controlled net 
rewriting system. 

An event of a marked- controlled net rewriting system is either a transition 
or a rewriting rule: E = T U R. 



Definition 2. A net rewriting system [3,11,12,13] is a marked- controlled net 
rewriting system where the set of control places C is empty (C = %) for all 
rewriting rules, that is, there is no restriction on the net marking. 

In order to apply a rewriting rule r to a configuration P, one must first 
identify a full subnet P' of P that is isomorphic to the left-hand side of the 
rule; that is, there exists a bijection gr : {Pl U Tl) — l {P' U T') such that 
Ph{x,y) = P' , ip{y)) for all x,y £ Pl k) T^. The elements of P (places or 
transitions) that do not belong to P' constitute the context of the rule. It is 
also required that Vp G C,M{(p{p)) > M(p). In order for the rule to be enabled, 
an element x' of P' must also have an element x of its preset that belongs to 
the context only if x' belongs to the input interface of the rule. In addition, 
an element x' of P' must also have an element x of its postset that belongs to 
the context only if x' belongs to the output interface of the rule. When these 
conditions are met, the rewriting can take place and it proceeds by replacing the 
subnet P' with the right-side R of the rule and by fixing the connections between 
the elements of R and those in the context according to the interface relation. 
The transfer relation is not only used to rewrite the left-side of the rule with the 
right-side but it is also used to transfer the tokens in P' to R. Notice that, since 
the transfer relation can be any type of relation, tokens may be duplicated or 
may disappear. 

The dynamic evolution of a marked-controlled net rewriting system is given 
by its state graph. 

Definition 3. The state graph of a marked-controlled net rewriting system N = 
{TZ, {Po,Mq)) is the labeled directed graph whose nodes are the states of N (i.e., 
marked Petri nets) and whose arcs (labeled with events of N ) are of two kinds 
described below: 

— firing of a transition: arcs from state {P,M) to state {P',M') that are la- 

beled with transition t when transition t can fire in the net P at mark- 
ing M and leads to marking M' : (P,M) \ (P',M') (P = 

P' and M[t)M'inP). 

— change in configuration: arcs from state (P,M) to state {P',M') that are 
labeled with rule r = (L, i?, t,*t, r*, C, M) G TZ, when there exists a full 
embedding f : L ^ P such that V a; ^ f{L) o,nd y £ Pl h) T^: 



X £ * f{y) ^ y £ Dom(*T) and x £ f{y)* y G Dom{T*) 
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and y p £ C, M{f{p)) > M(p) and the following holds where F = (P,T,F) 
and F' = {P',T',F'): 



P' = P — fiPr) + Pr such that Plt C P^ 
T' = T — f {Tif) + Tfj such that Ti^t C Tr 



where the meaning of + (-) is adding (removing) places/transitions to (from) 
F. The name of places Pr (transitions Tr) added to F must be new in order 
to avoid clashes. 

{ y) if x,y i PrU Tr 

p,, PR{x,y) ifx^yePRUTR 

~ I Ey,e -ry f(Vi)) if x^PrUTr A yePRUTR 

IEx,g T’xF{f{x^),y) if X £ PrUTr A y ^ Pr U 



M'(p) 



M{p) ifpiPR 
Ep'G rpFt{f{p')) ifP&PR 



The following example illustrates the definition of a marked-controlled net 
rewriting system and the meaning of a change in configuration. 




Fig. 1. Controlled Net Rewriting System modeling a system of printers 



Example F The marked-controlled net rewriting system in Fig. 1 models the 
printing of several copies of the same job using different printers. The token in 
the place buffer-printer represents one copy of the job to print. In this state 
(F,M), to obtain three copies, the job must be sent to print three times and 
these three copies can only be printed sequentially (one after the other). The 
rewriting rule triple offers the possibility of printing each one of them using 
a different printer (assuming there is some job to print). The subset of con- 
trol places is C = {p-ini} and M(pJnz) = 1. The transfer relation r is given 
by T = {{{pJni}, {pA,p-2,p-5}), ({t_0}, {PI, P2, t_3}), {{p-cnd} , {p-end})} and 
the input and output interface relations are *t = {{{pJni}, |p_l,p_2,p_3})} and 
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(T, M) : 

bufferj 




Fig. 2. Change in configuration due to the rewriting rule triple 



r* = {({p_end}, {p-end})}, respectively. Figure 2 shows the new state 
due to the change in configuration caused by the rewriting rule. 



Proposition 1. Marked- controlled net rewriting systems are Turing powerful. 
Proof. It is straightforward from [11,12,13]. 



3 Marked-Controlled Reconfigurable Nets 

Reconfigurable nets are a subclass of net rewriting systems where the transfer re- 
lation T is a bijection [3,11,12,13]. In other words, the set of places and transitions 
is left unchanged by rewriting rules in reconfigurable nets. Such rules are enabled 
if and only if the net has a certain topology. Moreover, such rules only change 
the flow relations of the places in their domains. We introduce marked-controlled 
reconfigurable nets as a subclass of marked-controlled net rewriting systems 
where the transfer relation r is a bijection. We introduce marked-controlled re- 
configurable nets as an extension of reconfigurable nets where the enabling of a 
rewriting rule also depends on the net marking. 

Definition 4. A marked-controlled reconfigurable net is a structure N = {P, T, 
TZ,jo) where P = {pi,... ,p„} is a non empty and finite set of places, T = 
{ti, . . . , tm} is a non empty and finite set of transitions disjoint from P (PC\T = 
%), TZ = {ri, . . . , rh} is a finite set of rewriting rules, and 70 is the initial state. 

A rewriting rule r £ TZ is a structure r = (D,* r, r*, C, M) where D Q P is 
the domain of r, *r : (D x T) U (T x D) — >■ N and r* : (I? x T) U (T x D) — >■ N 
are the preconditions and postconditions of r, (i.e. they are the flow relations 
of the domain places before and after the change in configuration due to rule r). 
C is a subset of places of D, C C D, called control places, andM is the required 
minimum marking of places of C so that the rule can be enabled. 

A configuration of a marked-controlled reconfigurable net is a Petri net P = 
(P,T,F). 

A state j of a marked-controlled reconfigurable net is a marked Petri net 

i = {r,M). 

The events of a marked-controlled reconfigurable net are its transitions to- 
gether with its rewriting rules: E = T UTZ. 
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Definition 5. A reconfigurable net [3,11,12,13] is a marked- controlled recon- 
figurable net where the set of control places C is empty (C = %) for all rewriting 
rules; that is, there is no restriction on the net marking. 

We represent a rewriting rule using formal sums notation as 
r = Ep6DP(Et6T ’r(p. i) • t - Eter *r(t,p) • t)l> Epee P(EteT r*(p, t) • t - Etgr r*(i,p) • t) 

Definition 6 . The configuration graph G{N) of a marked-controlled reconfig- 
urable net N = (P,T,TZ,jo) is the labeled directed graph whose nodes are the 
configurations, such that there is an arc from configuration T to configuration 
r' labeled with rule r = {D, *r, r*, C, M) G TZ, which we denote r[r)r' , if and 
only if the following holds: 



Vp G C, M{p) > M{p) 

w ^ n F{t,p) = *r{t,p) 

X F'{p,t) = r*{p,t) and F'{t,p) = r*{t,p) 

Vp ^ D, F{p,f) = F'{p,f) and F{t,p) = F'{t,p) 

Notice that we require the control places to be marked with at least the 
marking M. The transition relation must contain arcs of the exact multiplic- 
ity appearing in the left-hand side of the rewriting rule, and we do not allow 
rewriting if arcs of a greater multiplicity are present. 

The dynamic evolution of a marked-controlled reconfigurable net is then given 
by its state graph. 

Definition 7. The state graph of a marked-controlled reconfigurable net N = 
{P,T,TZ,Jq) is the labeled directed graph whose nodes are states of N and whose 
arcs (labeled with events) are of two kinds: 

— firing of a transition: arcs from state (T, M) to {F, M') that are labeled with 
transition t when transition t can fire in the net F at marking M and leads 
to marking M' , 

— change in configuration: arcs from state (F, M) to state {F\ M) that are 
labeled with rule r €TZ if F[r)r' is a transition of the configuration graph of 
N. 

In other words, the set of labeled arcs of the state graph of N is given by 

{{F,M) {F,M')\M[t)M' in F}U{{F,M) 11^ {r',M)\r[r)F' in G{N)} 

The following example shows a system that is modeled by a marked- 
controlled reconfigurable net and a change in configuration that depends on 
the net topology and that also depends on the net marking. 

Example 2 (Transmission Net). Figure 3 is the initial state 70 = {Fq,Mq) of a 
marked-controlled reconfigurable net that represents a transmission net that is 
receiving blocks of information from two machines (represented by transitions 
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^(•) "0 "0 O 

•l Pi >2 P2 *3 Ps *4 / P 4 *3 Pb *6 Pb *7 P? 



Fig. 3. The initial state of transmission net 



ti and tg) to be carried to a main building (represented by place pr). The initial 
marking is Mq = (1, 2, 0, 1, 0, 0, 1). 

This net has two different parts that are delimited by places pi, ps and pr. In 
each part, some changes in configuration can take place and they are represented 
by rewriting rules. In the first part, the packages can be sent in ones, in twos, in 
threes or in fours, depending on the weight of the arc from place p2 to transition 
tg. In the second part, the net offers three possibilities for the forwarding of 
data in the net: packages follow the normal path packages 

avoid transition and follow the path tiPitQPQt’jp’j] and finally, packages can 
go directly through the path tiPitjpT, avoiding transitions tg and Iq. We define 
then eight rewriting rules using the formal sums notation previously introduced. 
We show the last one (Rg) graphically in Fig. 4. 

Ri: P2(t2-t3) > P2(t2-2t3) R 2 : P2(t2-t3) > P2(t2-3t3) 

R 3 : P2(t2-t3) > P2(t2-4t3) R 4 : P2(t2-2t3) > P2(t2-3t3) 

R5: P2(t2-2t3) > P2(t2-4t3) Re: P2(t2-3t3) [> P2(t2-4t3) 

Ry: P4(t4+t8-t5) + P5(t5-te) > P4(t4+t8-te)+P5(0) 

Rs: P4(t4+t8-t5) + P5(t5-t6) + Pe(t6-t7) > P4(t4+t8-t7)+P5(0)-p6(0) 



The subset of control places C is empty for all rewriting rules except for rules 
Ry and Re where C = {p^} and M(p4) = 1 (i.e., to change the path in the second 
part of the net, at least one package must be present in place P4). 



Rg 



'4 / P 4 *B Pb >6 Pa >7 

t. 



> 



Fig. 4. Rewriting rule Rg 




Therefore, the marked-controlled reconfigurable net consists of 7 places, 8 
transitions and 8 rewriting rules. Figure 5 shows the state reached when rule Rg 
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is applied to the initial state represented in Fig. 3. In this new state, packages 
are sent in ones and follow the path avoiding transitions and 




Fig. 5. New state reached after applying rule 7?s to the state in Figure 3 



4 Implementation of Marked-Controlled Reconfigurable 
Nets with Petri Nets 

We want to prove that marked-controlled reconfigurable nets are equivalent to 
Petri nets. This equivalence will ensure that all fundamental properties of Petri 
nets are still decidable for marked-controlled reconfigurable nets. This model 
is thus amenable to automatic verification tools. The translation of a marked- 
controlled reconfigurable net into an equivalent Petri net will considerably in- 
crease the net size and so it will be more efficient to directly implement the 
methods of verification of properties of Petri nets on the original model. 

At first glance, it might seem they are not equivalent because of the set of 
rewriting rules of controlled reconfigurable nets. When a rewriting rule is applied 
in a marked-controlled reconfigurable net, a change in configuration takes place 
(let ri[r)rj denote the change in configuration due to rewriting rule r from 
Fi to Fj), i.e., a change in net structure. To obtain an equivalent Petri net, 
these changes in configuration must be present (i.e., all possible configurations 
must be represented in the net). We can therefore deduce that the number of 
configurations must be finite to be represented. 

Let Conf{N) = {Fq,Fi,... ,Fk} denote the set of configurations of a 
marked-controlled reconfigurable net N, where Fq is the configuration of the 
initial state 70 = (Fo,Mo) (the initial configuration). We can then easily con- 
struct an equivalent Petri net N = {P, T, F, Mq) whose set of places is 

P = PU{qo, . . . , qk}, i.e., places of the original marked-controlled reconfigurable 
net together with one specific place attached to each possible configuration. We 
take as many copies of the set of transitions as the number of configurations, 
let {qo, . . . ,qk} x T. We can imagine places and transitions of a configuration 
Fi = {P,T,Fi) as if they were located in two different parallel planes (i.e., a 
plane with the set of places and a plane with the set of transitions connected by 
flow relations of the represented configuration) . Then we set up flow relations so 
that the configuration Fi is represented by the plane of transitions {qi} x T and 
the (shared) plane of places P: 
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F{P, = F^{p,t) where Fi = (P,T,Fi) 

F{{q^,t),p) = F^{t,p) where Fi = {P,T,Fi) 

Place Qi, which is associated to configuration Fi, contains at most one token, 
and it is marked in the states that are associated to this configuration. Thus, we 
set 



F{q^,{qj,t)) 



1 if t = j 
0 otherwise 



F{{qj,t),qi) 



1 if t = j 
0 otherwise 



Then it is only necessary to represent the change in configurations. For that 
purpose, it suffices to add one extra transition r^- for each change in configuration 

Fi[r)Fj that has a flow relation given by F{qi,rij ) = F{ Tijj Q]) = F{p,rij) = 
F{nj,p) = M(p) if p £ C and F{p,nj) = F{nj,p) = 0, otherwise. So, the set 
of transitions is T = ({go, ■ • • ,Qk} x T) UR where R is the set of transitions 
such that Tij £ R if 3r £ TZ such that Fi[r)Fj in G{N). The resulting Petri net 



initially is marked as: Mq(p) = Mq{p) where p £ P and Mo^qi) = | q otherwise ’ 

To show the correspondence between marked-controlled reconfigurable nets 
and Petri nets some considerations might be taken into account: 



— If M : P — >■ N is a reachable marking of N then 'Y^M{qi) = 1. So, from the 

i=0 

subset of transitions (go, ■ • ■ , 9fc} x T G T only the transitions in {g^} x T 
were enabled. 

— We denote = (F~ , M) where M : P — >■ N is a marking of P~ and F~ = 

Fi is the configuration associated to M such that M{qi) = 1. Inversely, if 
7 = (P, M) is a reachable state of N , we associate the mapping : P — >■ N 

with M^{p) = M{p) if p £ P, Mj{qi) = 1 if P = Pj and Mj{qi) = 0 
otherwise. 



M(~ \ = M, 7 ~ = 7, 7 



'(Mo) 



(Po,Mo) and = Mo- 



Proposition 2. If "f = {F,M) is a reachable state of N then 

F 7[r)7' <1=^ M.^[rij)M~^i where 7 = (Pi, M) and 7' = {Fj, M). 

2. 7[t)7' Mj[{qi,f))M,yi where 7= (Fi,M) and 7' = {Fi,M'). 

The previous proposition shows that: 

1. If we change from state 7 = (Fi,M) to state 7' = {Fj,M) in a marked- 
controlled reconfigurable net due to the firing of a rewriting rule, what differs 

in the equivalent Petri net is the marking of places qi and g^: M-y^qf) = 1 

My'iqi) = 0 and My{qj) = 0 Myi{qj) = 1. Also in the other direction. 
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2. If we change from state 7 = to state 7 ' = {Fi,M') in a marked- 

controlled reconfigurable net due to the firing of a transition t, what changes 
in an equivalent Petri net is the marking of places p G P involved by the 
firing of transition and vice versa. 

Hence we have established that: 

Proposition 3. The markings in which 7 covers the reachable states of N 

are the reachable markings of N, and the marking graph of N is isomorphic to 
the marking graph of N. In this sense, any marked- controlled reconfigurable net 
is equivalent to some Petri net. 

Proof. It is straightforward. 

Thus, marked-controlled reconfigurable nets are equivalent to Petri nets, but 
they provide somewhat more compact representations of concurrent systems 
whose structure evolves at runtime. This can be observed in Fig. 6 , which illus- 
trates a fragment of the Petri net that is equivalent to the marked-controlled 
reconfigurable net in Example 2. To facilitate the understanding of the imple- 
mentation of a marked-controlled reconfigurable net with a Petri net, we only 
show how to represent one change in configuration (the change of path in the 
second part of the net that avoids transitions t^ and tg -rewriting rule Rg-) and 
only for the places and transitions involved. With this short fragment, we can 
imagine how big the entire Petri net is and how we can best model the system 
with a marked-controlled reconfigurable net. In general, if the original marked- 
controlled reconfigurable net N = (P,T, 7^, 70 ) has: 

— n places, 

— m transitions, 

— r rewriting rules and 

— the number of accesible configurations is fc -I- 1, 

the obtained equivalent Petri net N = {P, T, F, Mq) consists of: 

— n -I- (fc -I- 1) places and 

k 

— {{k -\- 1 ) * m) -\- z transitions, where 2 ; = |P*| (|*Pi|) 

i=0 

is the number of output (input) arcs from (to) the configuration Pi in the 
configuration graph of N, G{N). That is, 2 : is the number of transitions of 

R. 

This increase on the net size presumes that it may be more efficient to directly 
implement the methods of verification of properties of Petri nets on the marked- 
controlled reconfigurable net than on the equivalent Petri net. In any case, it 
is possible to study the properties of the net (place boundedness, reachability, 
deadlock, liveness and structural properties) using the existing analysis tools for 
Petri nets on the equivalent Petri net. 
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Fig. 6. Part of the Petri net that is equivalent to the Marked-controlled reconfigurable 
net in Example 2 



5 Related Work and Conclusions 

In previous studies [3,11)12,13], we have introduced net rewriting systems, which 
are an extension of Petri nets that are suited for the modeling, simulation and 
verification of concurrent systems that are subject to dynamic changes. Net 
rewriting systems can dynamically modify their own structure by rewriting some 
of their components. This rewriting depends only on the net topology. Here 
we have introduced marked-controlled net rewriting systems, an extension of 
net rewriting systems where the rewriting also depends on the net marking. 
Both models are based on two different lines of research that extend the basic 
model of Petri nets, making the description of dynamic changes in concurrent 
systems possible: graph grammars [6,17,4] and Valk’s self-modifying nets [18,19]. 
The rewriting rules of marked-controlled net rewriting systems are very similar 
to productions of graph grammars (they have left- and right-hand sides and 
interfaces), and the application of a rewriting rule is like a direct derivation in 
graph grammars (under certain conditions, once an occurrence (a match) of the 
left-hand side in a graph has been detected, it can be replaced by the right-hand 
side) . As in self-modifying nets, we model a system which consists of a collection 
of Petri nets, called configurations, and a mechanism that allows the system to 
evolve from one configuration to another under certain circumstances. 

Besides Valk’s self-modifying nets, in the literature, there are some other 
models that allow for the description of complex, dynamic concurrent systems. 
The mobile nets of Asperti and Busi [1] originated from a merging of Petri nets 
with the name-managing techniques typical in 7r-calculus [14]; the dynamic nets 
of Buscemi and Sassone [5] inspired by the join calculus [9]. Both of these allow 
the dynamic creation of components, as in our proposal. Other models are the 
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A-nets of Gradit and Vernadat [10], a rewriting formalism which integrates the 
advantages of Petri nets and graph grammars, respectively, for behavior speci- 
fication and topological transformations of a workflow, and the POP formalism 
introduced by Engelfriet, Leih and Rozenberg [8], and the related model of co- 
operating automata by Badouel, Darondeau and Tokmakoff [2], in which tokens 
are active elements with dynamic behavior. For all of them, as in our model, the 
description of changes is internal and incremental and their handling is local. 
Also, the idea of rewriting underlies all these proposals; the configuration of the 
system is described as a Petri net and a change in configuration is described as 
a graph rewriting rule which replaces the part of the system that matches the 
left-hand side of the rewriting rule by the corresponding right-hand side. With 
respect to the expressive power, all of them are Turing-equivalent, as our model. 
The model of marked-controlled net rewriting systems is closer to Petri nets. 

The model of marked-controlled reconfigurable nets is introduced here as a 
specific marked-controlled net rewriting system where the transfer relation r is 
a bijection. This model (even if formally equivalent to Petri nets) allows us to 
more precisely express systems in which structural dynamic changes can occur. 
However, automatic translation into Petri nets ensures that all the fundamental 
properties of Petri nets are still decidable for marked-controlled reconfigurable 
nets. This model is thus amenable to automatic verification tools. However, the 
expansion into equivalent Petri nets may significantly increase the size of the 
net. Therefore, it may be more efficient to directly implement the methods of 
verification of properties of Petri nets on the original model. This presumes 
that we can define the notions of covering graphs, linear invariants, siphons and 
traps directly for a marked-controlled reconfigurable net. These topics are the 
subject of our current research. In contrast, the entire class of marked-controlled 
net rewriting systems is Turing powerful, and thus automatic verification is no 
longer possible in this case. However, this model is still interesting as a modeling 
and simulation tool. For some of the models described above, software tools 
have been developed for editing and simulating systems using a graphical user 
interface. These tools provide software support for the design of prototypes for 
dynamic concurrent systems. We are currently implementing a simulator for 
marked-controlled net rewriting systems. 
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Abstract. We introduce and study three notions of typeness for automata on in- 
finite words. For an acceptance-condition class 7 (that is, 7 is weak, Biichi, co- 
Biichi, Rabin, or Streett), deterministic y-typeness asks for the existence of an 
equivalent 7 -automaton on the same deterministic structure, nondeterministic 7 - 
typeness asks for the existence of an equivalent 7 -automaton on the same structure, 
and y-powerset-typeness asks for the existence of an equivalent 7 -automaton on 
the (deterministic) powerset structure - one obtained by applying the subset con- 
struction. The notions are helpful in studying the complexity and complication of 
translations between the various classes of automata. For example, we prove that 
deterministic Biichi automata are co-Biichi type; it follows that a translation from 
deterministic Biichi to deterministic co-Biichi automata, when exists, involves no 
blow up. On the other hand, we prove that nondeterministic Biichi automata are 
not co-Biichi type; it follows that a translation from a nondeterministic Biichi to 
nondeterministic co-Biichi automata, when exists, should be more complicated 
than just redefining the acceptance condition. As a third example, by proving that 
nondeterministic co-Biichi automata are Biichi-powerset type, we show that a 
translation of nondeterministic co-Biichi to deterministic Biichi automata, when 
exists, can be done applying the subset construction. We give a complete picture of 
typeness for the weak, Biichi, co-Biichi, Rabin, and Streett acceptance conditions, 
and discuss its usefulness. 



1 Introduction 

Finite automata on infinite objects were first introduced in the 60’s. Motivated by deci- 
sion problems in mathematics and logic, Biichi, McNaughton, and Rabin developed a 
framework for reasoning about infinite word and infinite trees [Biic62,McN66,Rab69]. 
The framework has proved to be very powerful. Automata, and their tight relation to 
second-order monadic logics were the key to the solution of several fundamental deci- 
sion problems in mathematics and logic [Tho90]. Today, automata on infinite objects 
are used for specification and verification of nonterminating systems. In the automata- 
theoretic approach to verification, we reduce questions about systems and their specifi- 
cations to questions about automata. More specifically, questions such as satisfiability of 
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specifications and correctness of systems with respect to their specifications are reduced 
to questions such as nonemptiness and language containment [VW86,Kur94,VW94]. 
The automata-theoretic approach separates the logical and the combinatorial aspects 
of reasoning about systems. The translation of specifications to automata handles the 
logic and shifts all the combinatorial difficulties to automata-theoretic problems. Recent 
industrial-strength property-specification languages such as Sugar [BBE+01], ForSpec 
[AFF+02], and the recent standard PSL 1.01 [www.accellera.org] include regular ex- 
pressions and/or automata, making the automata-theoretic approach even more essential. 

Since a run of an automaton on an infinite word does not have a final state, acceptance 
is determined with respect to the set of states visited infinitely often during the run. 
There are many ways to classify an automaton on infinite words. One is the class of its 
acceptance condition. For example, in Biichi automata, some of the states are designated 
as accepting states, and a run is accepting iff it visits states from the accepting set 
infinitely often [Biic62]. Dually, in co-Buchi automata, a run is accepting iff it visits 
states from the accepting set only finitely often. More general are Rabin automata. Here, 
the acceptance condition is a set a = {{G\,Bi), . . . , (Gk,Bk)} of pairs of sets of 
states, and a run is accepting if there is a pair {Gi, Bi) for which the set of states visited 
infinitely often intersects Gt and is disjoint to Bi. The condition a can also be viewed 
as a Streett condition, in which case a run is accepting if for all pairs {Gi, Bi), if the 
set of states visited infinitely often intersects Gi, then it also intersects Bi. The number 
k of pairs in a is referred to as the index of the automaton. Another way to classify 
an automaton is by the type of its branching mode. In a deterministic automaton, the 
transition function S maps a pair of a state and a letter into a single state. The intuition 
is that when the automaton is in state q and it reads a letter a, then the automaton moves 
to state S{q, a), from which it should accept the suffix of the word. When the branching 
mode is nondeterministic, S maps q and a into a set of states, and the automaton should 
accept the suffix of the word from one of the states in the set. 

The applications of automata theory in reasoning about systems have led to the 
development of new classes of automata. In [MSS86], Muller et al. introduced weak au- 
tomata. Weak automata can be viewed as a special case of Biichi or co-Buchi automata 
in which every strongly connected component in the graph induced by the structure of 
the automaton is either contained in a or is disjoint from a. Since reasoning about spec- 
ifications is often done by recursively reasoning about their sub-specifications, known 
translations of temporal-logic specifications to Biichi automata actually result in weak 
automata [MSS86,KVW00,KV98b]. The special structure of weak automata is reflected 
in their attractive computational properties and makes them very appealing. Essentially, 
while the formulation of acceptance by a Biichi or a co-Biichi automaton involves alter- 
nation between least and greatest fixed-points, no alternation is required for specifying 
acceptance by a weak automaton [KVWOO]. Deterministic weak automata have recently 
being used to represent real numbers. A real number x in base r is represented by a word 
in the form Wi • Wf where Wi is the integer part of x and is the float part of x, and 
both are words over the alphabet {0,l,...,r — 1}. This way for instance, the real number 
5^ in base r = 10 is represented by 0*5* 50“ or by 0*5 *49“. In a similar way, a vector 
V = {x\,X2, ■■■,Xn) of real numbers is represented by a word of the form Wi »Wf 
where Wi is in ({0, 1, ..., r — 1}”)* and Wf is in ({0, 1, ..., r — 1}”)“. As real numbers 
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may have several representations, real vectors may have several representations too. A 
real vector automaton is a Biichi automaton that either accepts all the representations 
of some vector v G IR" or none of them. It is proved in [BJWOl] that an RVA is a 
deterministic weak automaton. 

It turns out that different classes of automata have different expressive power. For 
example, unlike automata on hnite words, where deterministic and nondeterministic 
automata have the same expressive power, deterministic Biichi automata are strictly 
less expressive than nondeterministic Biichi automata [Lan69]. That is, there exists a 
language C over infinite words such that C can be recognized by a nondeterministic 
Biichi automaton but cannot be recognized by a deterministic Biichi automaton. It also 
turns out that some classes of automata may be more succinct than other classes. For 
example, translating a nondeterministic co-Biichi automaton into a deterministic one is 
possible [MH84], but involves an exponential blow up. As another example, translat- 
ing a nondeterministic Rabin automaton with n states and index k, into an equivalent 
nondeterministic Biichi automaton may result in an automaton with 0{k ■ n) states, and 
if we start with a Streett automaton, the Biichi automaton may have n ■ 20(k) 
[SV89]. Note that expressiveness and succinctness depend in both the branching type 
of the automaton as well as the class of its acceptance condition. 

There has been extensive research on expressiveness and succinctness of automata 
on infinite words [Tho90]. In particular, since reasoning about deterministic automata 
is simpler than reasoning about nondeterministic ones, questions like deciding whether 
a nondeterministic automaton has an equivalent deterministic one, and the blow-up in- 
volved in determinization are of particular interest. These questions get further motiva- 
tion with the discovery that many natural specifications correspond to the deterministic 
fragments: it is shown in [KV98a] that an LTL formula ip has an equivalent alternation- 
free /i-calculus formula iff ip can be recognized by a deterministic Biichi automaton, 
and, as mentioned above, real vector automata are deterministic weak automata. 

For deterministic automata, where Biichi and co-Biichi automata are less expressive 
than Rabin and Streett automata, researchers have come up with the notion of a deter- 
ministic automaton being Biichi type, namely it has an equivalent Biichi automaton on 
the same structure [KPB94]. It is shown in [KPB94] that Rabin automata are Biichi type. 
Thus, if a deterministic Rabin automaton A recognizes a language that can be recog- 
nized by a deterministic Biichi automaton, then A has an equivalent deterministic Biichi 
automaton on the same structure. On the other hand, Streett automata are not Biichi 
type: there is a deterministic Streett automaton A that recognizes a language that can 
be recognized by a deterministic Biichi automaton, but all the possibilities of dehning a 
Biichi acceptance condition on the structure of A result in an automaton recognizing a 
different language. 

As discussed in [KPB94], Biichi-typeness is a very useful notion. In particular, a 
Biichi-type deterministic automaton can be translated to an equivalent deterministic 
Biichi automaton with no blow up. In this work, we study typeness in general: we 
consider both nondeterministic and deterministic automata, for the five classes 7 of 
acceptance conditions described above (7 is either Biichi, co-Biichi, Rabin, Streett, or 
weak). We dehne and examine three notion of typeness: 
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1 . Deterministic ^ -typeness asks for which classes of deterministic automata, the ex- 
istence of some equivalent deterministic 7 automaton implies the existence of an 
equivalent deterministic 7 automaton on the same structure. For example, we show 
that all deterministic automata are weak type. 

2. Nondeterministic y-typeness asks for which classes of nondeterministic automata, 
the existence of some equivalent nondeterministic 7 automaton implies the existence 
of an equivalent nondeterministic 7 automaton on the same structure. For example, 
we show that nondeterministic Biichi automata are not co-Biichi type. This answers 
a question on translating Biichi to co-Biichi automata that was left open in [KV98a]. 

3. y-powerset-typeness asks for which classes of nondeterministic automata, the ex- 

istence of some equivalent deterministic 7 automaton implies the existence of an 
equivalent deterministic 7 automaton on the structure obtained by applying the sub- 
set construction to the original automaton. For example, while deterministic Rabin 
automata are Biichi-type, nondeterministic Rabin automata are not Biichi powerset- 
type. The notion of powerset-typeness is important for the study of the blow-up 
involved in the translation of automata to equivalent deterministic ones. While for 
some classes a lower bound is known, powerset-typeness implies a 2" 

upper bound for other classes. We also examine finite-typeness for nondeterministic 
Biichi automata - cases where the limit language of the automaton when viewed as 
an automaton on finite words is equivalent to that of the Biichi automaton, and we 
relate finite-typeness with powerset-typeness. 

Our results, along with previously known results, are described in Figures 2, 3, and 5. 



2 Preliminaries 

Given an alphabet E, an infinite word over E is an infinite sequence w = (Tq • CTi • CT 2 • • • 
of letters in E. We denote the set of all infinite words over E by E‘^. A language £ is a 
set of words from E'^ . An automaton over infinite is a tuple A = {E, Q, S, Qo,a), 
where E is the input alphabet, Q is a finite set of states, (5 : Q x 17 — 2*^ is a transition 
function, Qq C Q is a set of initial states, and a is an acceptance condition which is a 
condition that defines a subset of We define several acceptance conditions below. 
Intuitively, 6{q, a) is the set of states that A may move into when it is in the state q and 
it reads the letter u. The automaton A may have several initial states and the transition 
function may specify many possible transitions for each state and letter, and hence we 
say that A is nondeterministic. In the case where jQol = 1 and for every q G Q and 
a G E, we have that |5(g, ct) | = 1, we say that A is deterministic. 

Given an input infinite word w = ao ■ cii ■ 02 ■■ ■ G 27“, a run of A on w can be 
viewed as a function r : IN — >■ Q where r(0) G Qo, i-e., the run starts in one of the 
initial states, and for every i > 0, we have that r{i + 1) G 6{r{i),ai), i.e., the run obeys 
the transition function. Note that while a deterministic automaton has a single run on 
an input word w, a nondeterministic automaton may have several runs on w or none 
at all. Each run r induces a set inf{r) of states that r visits infinitely often. Formally, 
inf{r) = {q G Q : for infinitely many z G N, we have r(z) = q}. As Q is finite, 
it is guaranteed that inf{r) 0. The run r is accepting iff the set inf{r) satisfies 
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the acceptance condition a. A run that is not accepting is rejecting. We consider the 
following acceptance conditions. 

- A set S' satisfies a Biichi acceptance condition a Q Q if and only if S fl a ^ 0. 

- A set S satisfies a co-Biichi acceptance condition a C Q if and only if S fl a = 0. 

- A set S satisfies a Rabin acceptance condition a = {(Gi, Bi ), . . . , (Gfc, Bk)} C 

2<2 X 2*5 if and only if there exists a pair (Gj, Bi) € a for which S fl Gi ^ 0 and 
S n Si = 0. 

- A set S satisfies a Streett acceptance condition a = {(Gi, Bi), . . . , (Gfc, Bk)} C 

2*5 X 2*5 if and only if for all pairs {Gi,Bi) G a we have that S fl Gi = 0 or 

S n Bi 7^ 0. 



Note that the Biichi acceptance condition is dual to the co-Biichi acceptance con- 
dition: a set S satisfies a Biichi acceptance condition a iff S does not satisfy a as a 
co-Biichi acceptance condition. Similarly, the Rabin acceptance condition is dual to the 
Streett acceptance condition. The number k appearing in the Rabin and Street conditions 
is called the index of the automaton. An automaton A accepts an input word w iff there 
exists an accepting run of A on w. The language of A, denoted C{A), is the set of all 
infinite words that A accepts. 

The transition function 6 induces a relation Rs G Q x Q, where Rs{q, q') iff there is 
a G Bwith S(q, a) = (/'.Accordingly, the automaton xl induces a graph G _4 = (Q,Rs). 
For two states, q and q' of A, we say that q' is reachable from q if there is a (possibly 
empty) path in G_a from q to q' . A strongly connected component (SCC, for short) in 
Gjx is a set G C Q such that for all states q and q' in G, we have that q is reachable from 
q' . A maximal strongly connected component (MSCC, for short) is an SCC G that is 
maximal in the sense that we cannot add to G states and stay with an SCC. Thus, for all 
G' C Q \ G, the set G U G' is not an SCC. Note that a run of an automaton A eventually 
get trapped in an MSCC of G_a- We say that a Biichi automaton A is weak if for each 
MSCC G of G_ 4 , either G C a (in which case we say that G is an accepting component) 
or G n a = 0 (in which case we say that G is a rejecting component). Note that a weak 
automaton can be viewed as both a Biichi and a co-Biichi automaton. Indeed, a run of 
A visits a infinitely often iff it gets trapped in an accepting component, which happens 
iff it visits states in Q \ a only finitely often. 

We denote the different types of automata by three letters acronyms in {D,N} x 
{B, C, R, S,W} X { W,T } . The first letter stands for the branching mode of the automaton 
(deterministic or nondeterministic); the second letter stands for the acceptance-condition 
type (Biichi, co-Biichi, Rabin, Streett, or weak). The third letter stands for the objects on 
which the automata run (words or trees). For Rabin and Streett automata, we sometimes 
also indicate the index of the automaton. In this way, for example, NBW are nonde- 
terministic Biichi word automata, and DRW[1] are deterministic Rabin automata with 
index 1. 

Expressiveness and Typeness 

For two automata A and A' , we say that A and A' are equivalent if C{A) = C{A') . For 
an automaton type (3 (e.g., DBW) and an automaton A, we say that A is /3-realizable 
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if there is a /3-automaton equivalent to A. In Figure 1 below we describe the known 
expressiveness hierarchy for automata on infinite words. As described in the hgure, 
DRW and DSW are as expressive as NRW, NSW, and NEW, which recognize all oj- 
regular language [McN 66 ]. On the other hand, DEW are strictly less expressive than 
NEW, and so are DCW. In fact, since by dualizing a Eiichi automaton we get a co-Eiichi 
automaton, the two internal ovals complement each other. The intersection of DEW and 
DCW is DWW (note that while a DWW is clearly both a DEW and DCW, the other 
direction is not trivial, and is proven in [EJWOl]). Finally, NCW can be determinized 
(when applied to universal Eiichi automata, the translation in [MH84], of alternating 
Eiichi automata into NEW, results in DEW. Ey dualizing it, one gets a translation of 
NCW to DCW). In addition to the results described in the hgure, the index of DRW 
and DSW also induces a hierarchy, thus DRW[fc + 1] are strictly more expressive than 
DRW[fc], and similarly for DSW [Kam85]. 




Consider an automaton^ = {S,Q,6,Qo, a). We refer to {Q, 6, Qo) as the structure 
of the automaton. The powerset structure induced by A is ViA) = (2*5, S-p, {Qo})^ 
where for all S G 2^ and a G 27, we have that Sp{S, a) = Uses Thus, the 

powerset structure is obtained by the usual subset construction [RS59]. 

For an acceptance-condition class 7 (e.g., Eiichi), we say that A is 'y-type if A has 
an equivalent 7 automaton with the same structure as A. That is, there is an automaton 
A' = (27, Q, S, Qo, a') such that a' is an acceptance condition of class 7 and C{A') = 
C{A). We say that A is j-powerset-type if A has an equivalent 7 automaton with the 
same structure as the powerset structure of A. That is, there is an automaton Ap = 
(27, 2^ ,5p,{Qo}, Oip) such that ap is an acceptance condition of class 7 and£(^-p) = 
C{A). Note that the automaton Ap is deterministic. 



3 Typeness for Deterministic Automata 

In this section we consider the following problem: given two acceptance-condition types 
f3 and 7 , is it true that every D/JW that is D 7 W-realizable, is also 7 -type? We then say that 
D/JW are 7 -type. In other words, D/3W are 7 -type if every deterministic /3-automaton 
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that has an equivalent deterministic 7 -automaton, also has an equivalent deterministic 
7 -automaton on the same structure. 

Our results are described in Figure 2 below. Some results are immediate. For example, 
since the Biichi and the co-Biichi acceptance conditions are special cases of Rabin and 
Street! conditions (a Biichi condition a is eqnivalent to the Rabin condition {(a,0)} 
and to the Street! condition {(Q, a)}, and dually for co-Biichi), it is clear that DBW and 
DCW are Rabin-type and Streett-type. Similarly, since weak antomata can be viewed 
as Biichi or co-Biichi automata, they can also be viewed as a special case of Rabin 
and Street! automata. Thus, DWW are 7 -type for all the types 7 we consider. Such 
cases, where a translation of the acceptance condition exists, and is independent of the 
automaton, are indicated in the table by Some results are known, or obtained easily 
by dualizing known results, and the table contains the appropriate reference. Below we 
prove the new results. 





DWW 


DBW 


DCW 


DRW 


DSW 


DWW 




YES 

Lemma 1 


YES 

Lemma 1 


YES 

Lemma 1 


YES 

Lemma 1 


DBW 


YES 




YES 

Lemma 2 


YES 

[KPB94] 


NO 

[KPB94] 


DCW 


YES 


YES 

Lemma 2 




NO 

dualizing [KPB94] 


YES 

dualizing [KPB94] 


DRW 


YES 


YES 


YES 




NO 

Lemma 3 


DSW 


YES 


YES 


YES 


NO 

Lemma 3 





DRW[fc] are not Rabin[fc — l]-type, DSW[fc] are not Streett[fc — l]-type. Lemma 4 



Fig. 2. Typeness for deterministic automata. 



Lemma 1. D/3W are weak-type for all acceptance-condition types j3. 

Proof. In [BJWOl], the authors introduce the notion of a deterministic automaton being 
inherently weak (the definition in [BJWOl] is for DBW, and is easily extended to D/3W 
for all acceptance-condition types (5). A D/JW is inherently weak if none of its reachable 
Msec contains both accepting and rejecting SCCs. It is easy to see that an inherently 
weak automaton has an equivalent DWW on the same structure. Indeed, by definition, 
each of the MSCC of the automaton can be made accepting or rejecting according to the 
classification of all its SCCs. 

Let ^ be a DWW-realizable D/3W. Then, A is both DBW-realizable and DCW- 
realizable. Assume by the way of contradiction that A is not weak type. Then, A is 
not inherently weak, so there exists a reachable MSCC C of A such that C contains 
both an accepting SCC S and a rejecting SCC R. Since A is DBW-realizable, then, by 
[Lan69], every SCC S" D S' is accepting. In particular, C is accepting. Dnally, Since A 
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is DCW-realizable, then every SCC i?' 3 i? is rejecting. In particular, C is rejecting. It 
follows that C is both accepting and rejecting, and we reach a contradiction. 

We note that [BJWOl] prove that every DBW that accepts a language in Fa- fl Gs is 
inherently weak. The proof there, however, does not make a direct use of [Lau69], and 
is therefore much more complicated. 

Lemma 2. DCW are Biichi-type, and DBW are co-Buchi-type. 

Proof. Since a DCW can be viewed as a DRW, and DRW are Biichi type [KPB94], 
DCW are Biichi type too. Dually, DBW are co-Biichl-type. 

Note that if a DCW A is DBW-realizable, then it is also DWW-realizable. Indeed, 
by [BJWOl], DCW n DBW = DWW. Hence, by Lemma 1, A has an equivalent deter- 
ministic weak automaton on the same structure. Thus, Lemma 2 can be strengthen: a 
DCW that is DBW-realizable (dually, a DBW that is DCW-realizable) has an equivalent 
deterministic weak automaton on the same structure. 

Lemma 3. DRW are not Streett-type, and DSW are not Rabin-type. 

Proof. Since DSW can recognize all w-regular languages, DSW being Rabin-type means 
that every DSW has an equivalent DRW on the same structure. In [L6d99], Loding shows 
that a translation of a DSW to an equivalent DRW may involve an exponential blow up, 
thus typeness obviously cannot hold. The argument for DRW is dual. 

In addition to the results in the table, we prove that the expressiveness hierarchy 
known for the indices of DRW and DSW induces a typeness hierarchy: 

Lemma 4. Forallk > 2,wehavethatDRW[k]arenotRabin[k-l]-type,andDSW[k] 
are not Streettfk — 1 ]-type. 

Proof. Let = {1,2,... ,fc}. Consider the languages Lk of exactly all words 
containing infinitely many i’s, for all 1 < i < fc. Consider the DSW[fc] Ak = 
(L'fe,L’fc,i5, |l},afc), with 6{q,i) = i, for all q,i G Ek, and 

ak = {(L'fc, {1}), (27fc, {2}), . . . ,{Ek,{k})}. Thus, whenever Ak reads a letter 
i, it moves to state i, and the acceptance condition requires an accepting run to visit all 
states infinitely often. It is easy to see that Ak recognizes Lk. Also, since Lk can be 
viewed as the intersection of k DBWs Di, each for the language “inhnitely many z’s,” 
we know that Lk is DBW-recognizable, and hence also DSW[/c — l]-realizable. On the 
other hand, it is impossible to define a Streett[/e — 1] acceptance condition aj. so that 
Ak with condition recognizes Lk. To see this, note that for each letter i G Ek, the 
DSW Ak accepts (1 • 2 • • • A:)“ and rejects (1-2---Z — - k)‘^. For that, Ak 

must contain, for each i G Ek, a pair {Gi, Bf) such that fl 27^ 0 and Bi = |z}. 

Thus, Ak must contain at least k pairs, and we are done. It follows that DSW[fc] are 
not Streett[fc — l]-type. The argument for Rabin automata is dual, and considers the 
complement of L„ . 
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4 Typeness for Nondeterministic Automata 

In this section we consider the following problem: given two acceptance-condition types 
f3 and 7 , is it true that every N/7W that is N 7 W-realizable, is also 7 -type? We then say 
that N/3W are 7 -type. In other words, N/3W are 7 -type if every nondeterministic /?- 
automaton that has an equivalent nondeterministic 7 -automaton, also has an equivalent 
nondeterministic 7 -automaton on the same structure. 

Our results are described in Table 3 below. As in Section 3, some results follow 
immediately from translations of the acceptance condition, and are indicated in the table 
by The new results are proven in Lemmas 5, 6 , and 7. When the results follow from 
applying translations to results proven in the Lemmas, we indicate it with too. 





NWW 


NBW 


NCW 


NRW 


NSW 


NWW 




NO 

Lemma 5 


NO 

Lemma 6 


NO 

Lemmas 5 and 6 


NO 

Lemmas 5 and 6 


NBW 


YES 




NO 

Lemma 6 


NO 

Lemma 6 


NO 

Lemma 6 


NCW 


YES 


NO 

Lemma 5 




NO 
Lemma 5 


NO 
Lemma 5 


NRW 


YES 


YES 


YES 




NO 

Lemma 7 


NSW 


YES 


YES 


YES 


NO 

Lemma 7 





Fig. 3. Typeness for nondeterministic automata. 



Lemma 5. NBW are neither co-BUchi- nor weak-type. 

Proof. Consider the NBW A\ described in Figure 4. The NBW recognizes the language 
a* -b - {a-\- b)* (at least one 6 ). This language is in NWW and NCW, yet it is easy to see 
that there is no NCW (and hence also no NWW) recognizing L on the same structure. 

We note that the automaton in Figure 4 is a single-run automaton: every word ac- 
cepted by it has a single accepting run. This is of particular interest in the context 
of specification and verification, as the NBW described in [VW94] for LTL formulas 
are single-run automata. Our example shows that even such automata are neither co- 
Biichi- nor weak-type. It is shown in [KV98a] that an LTL formula ip has an equivalent 
alternation-free /i-calculus formula ip' iff the language of ip can be recognized by a 
DBW The construction of the formula ip' in [KV98a] goes via and therefore it 
involve a doubly-exponential blow-up. The construction of ip' may also go via an NCW 
A^i,, for ->ip. While ip' is of length linear in the size of the best known translation of 
LTL to NCW (when exists) actually constructs a DCW and is doubly-exponential. It is 
conjectured in [KV98a] that single-run NBW can be translated to NCW with only a lin- 
ear blow up, leading to an exponential translation of LTL to alternation-free /x-calculus. 
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h 




NBWsforo* -b - (a + b)*. 



In particular, the question of obtaining the NCW by modifying the acceptance condition 
of the NBW is left open in [KV98a]. Our result here answers the question negatively. 

We also note that NCW-typeness and weak-typeness do not coincide. Figure 4 also 
describes a different NBW, A 2 , for L. This NBW is NCW-type: an NCW with the 
same structure but with the acceptance condition a = { 90 ) 9i} accepts L. Yet, it is not 
weak-type. 

Lemma 6. NCW are neither Biichi- nor weak-type. 

Proof. Consider the two-state DCW A for the language L of all words with finitely 
many a’s. Since L is not DBW-realizable, and A is deterministic, A is not Biichi-type. 
The language L is NWW-realizable. But again, since A is deterministic and L is not 
DWW-realizable, it is not weak-type. 



Lemma 7. NRW are not Streett-type, and NSW are not Rabin-type. 

Proof. By Lemma 3, DRW are not Streett-type. Hence, there are DRW that are DSW- 
realizable but do not have an equivalent DSW on the same structure. Since DRW are a 
special case of NRW, it follows that NRW are not Streett-type. The proof for NSW not 
being Rabin-type is similar. 

By Lemma 4, DRW[fc] are not Rabin[A:— l]-type, andDSW[/c] are not Streett[fc— 1]- 
type, for all fc > 2. Thus, following the same considerations as in the proof of Lemma 7, 
we get that NRW[/c] are not Rabin[fc — l]-type, and NSW[fc] are not Streett[fc — l]-type. 

5 Powerset- Typeness for Nondeterministic Automata 

In this section we consider the following problem: given two acceptance-condition types 
f) and 7 , is it true that every N/?W that is D 7 W-realizable, is also 7 -powerset-type? 
We then say that N/3W are 7 -powerset-type. In other words, N/3W are 7 -type if every 
nondeterministic /3-automaton that has an equivalent deterministic 7 -automaton, also 
has an equivalent deterministic 7 -automaton on the powerset structure. 
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Our results are described in Table 5 below. Since A = 'P(A) for a deterministic 
automaton A, we know that N/3W cannot be 7-powerset-type if D/3W are not 7-type. 
Thus, the negative cases in Figure 2 immediately induce negative cases here. In particular, 
for all fc > 2, we have that NRW[fc] are not Rabin[fc — l]-powerset-type, and NSW[fc] 
are not Streett[fc — l]-powerset-type. 
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[MS97] 
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NO 
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NO 
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NO 
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NO 

Lemma 10 


NO 
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NO 
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NO 

Lemma 10 


NO 
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Fig. 5. Powerset-typeness for nondeterministic automata. 



Lemma 8. NWW and NCW are Buchi-powerset-type. 

Proof. Consider an NCW A. Recall that A is DCW-realizable. Therefore, if A is DBW- 
realizable, then it is also DWW-realizable. Hence, as NCW are weak-powerset-type, 
there is a DWW, and thus also a DBW, equivalent to A with structure V{A). Thus, 
NCW are Biichi-powerset-type. Since NWW are a special case of NCW, the result for 
NWW follows. 

Lemma 9. NBW are not Biichi-powerset-type. 

Proof The NBW A in Figure 6 recognizes the language of all words with infinitely 
many occurrences of the subword ab. The language can be recognized by a DBW, yet 
no DBW for it can be defined on top of ViA). 



^ : a ■■ 




Fig. 6. An NBW for ((a -|- 6)* • o • 6)" that is not Biichi-powerset-type. 
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Lemma 10. NWW are neither co-Buchi-, Rabin-, nor Streett-powerset-type. 

Proof. The NWW A in Figure 7 recognizes the language of all words with a (a ■ 6)“ 
tail. The language can be recognized by a DCW, and hence also by a DRW and DSW. 
Yet, no DCW, DRW, or DSW for it can be defined on top ofV(A). 




Fig. 7. An NEW for (a + 6)* • (a • 6)“ that is neither co-Biichi-, Rahin-, nor Streett-powerset-type. 



5.1 From NBW to NFW 

Recall that DBW are strictly less expressive than NBW. A language L C can be 
recognized by a DBW iff there is a regular language R C E* such that L = limR', 
that is, w G L iff w has infinitely many prefixes in R [Lan69]. An open problem is to 
construct, given an NBW A for L, such that A is DBW-realizable, an NFW A' for the 
corresponding R. An immediate upper bound follows from the 

determinization construction of [Saf88] for A (since DRW are Btichi type, the DRW 
constructed in [Saf88], can be converted to a DBW on the same structure). While the 
20 (niogn) up in determinization is tight [Mic88,L6d99], no super-linear lower 
bound is known for the translation of ^ to The challenges in this problem are similar 
to these in the problem of translating an NBW that is NCW-realizable to an equivalent 
NCW. While a upper bound is immediate, no super-linear lower bound is 

known. 

Consider and NBW A. Let Afin be A viewed as an NFW. We say that A is finite-type 
if C{A) = limC{A). Note that for a finite-type NBW, the translation to NFW is linear, 
and the NFW is on the same structure as the NBW. 

The notion of powerset-typeness turns out to be related to finite-typeness: for an 
automaton A = {E ,Q,5,Qq,o), let 5(.4) = {2^ ,5'p,{Qo),Oip) be the automaton 
obtained from A by applying to it the subset construction. Thus, the structure of 5(.4) is 
the powerset-structure of A, and a state is in a-p if its intersection with a is not empty. We 
refer to 5 (.4) as the subset automaton of A. Clearly, for an NFW A, we have that A and 
5(.4) are equivalent [RS59]. We say that an NBW ^is Biichi-subset-type if ^ and 5 (.4) 
are equivalent. Note that if A is Biichi-subset-type, then it is also Biichi-powerset-type, 
but as we shall see below, the other direction does not necessarily hold. 

Theorem 1. An NBW is Biichi-subset-type iff it is finite-type. 
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Proof. Assume first that A is not Biichi subset-type. Since 5(^) is a DBW, then 
£(5(^)) = Since is an NFW, then It fol- 

lows that £(5(^)) = Zzm£(^^„).Since^isnotBuchi subset-type, £(^) f £(5(^)). 
It follows that C{A) ^ limC{Afin), thus A is not finite-type. 

Assume now that A is Biichi subset-type. For every NBW A, we have that C{A) C 
limC{Afin). Indeed, an accepting run of ^ on a word w points to infinitely many prefixes 
of w that are accepted by A fin - It is left to prove that limC{Afin) ^ C{A). Consider a 
word w € limC{Afin). Thus, w has infinitely many prefixes in £{Afin). Since Afin is 
an NFW, then C{Afi„) = C{S{A)fin)- It follows that w has infinitely many prefixes in 
£{S{A)fin), or equivalently, that the run of 5(^) on w visits the set of accepting states 
infinitely often, implying that w G £(5(^)). Since A is Biichi-subset-type, w is also 
accepted by A, and we are done. 

As proved in [MS97], an NBW that is DWW-realizable is also Biichi-subset-type 
(note that since all DBW are Biichi-subset-type whereas not all DBW are DWW- 
realizable, the other direction does not hold). It follows that all NBW that are DWW- 
realizable are finite-type and can be linearly translated to the corresponding NFW. 

We now show that powerset-typeness is not sufficient for finite-typeness, and the 
stronger condition of subset-typness is required. 

Lemma 11. Powerset-type NBW are not finite-type. 

Proof. The NBW A in Figure 8 recognizes the language of all words with infinitely 
many &’s but no two successive 6’s. The DBW obtained by augmenting the powerset 
structure of A, also described in the figure, with the acceptance condition a-p = {{!}} 
is equivalent to A. Thus, A is powerset type. On the other hand, 5(.4) is not equivalent 
to A, and indeed, there is no way to augment Afin with an acceptance condition ap that 
results in an automaton A p for which lim{£{Ap ) ) = £ (.4) . To see this, note that either 

is empty, in which case £{Ap) is empty, or ai? is not empty, in which case £{Ap) 
contains a^, thus lim{£{Ap)) contains a“, which is not in £{A). 




Fig. 8. An NBW for (a^ • 6)“ that is powerset-type but not finite-type. 



6 Discussion 

We studied three notions of typeness for automata on infinite words. The notions are 
helpful in studying the complexity and complication of translations between the various 
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classes of automata. Of special interest is the blow-up involved in a translation of NBW 
to NCW, when exists. As discussed in Section 4 , a polynomial translation will enable an 
exponential translation of LTL to alternation-free /i-calculus (for formulas that can be 
expressed in the alternation-free /i-calculus), improving the doubly-exponential known 
upper bound. Current translations of NBW to NCW actually construct a DCW with 
20(niogn) (starting with an NBW with n states), whereas even no super-linear 

lower bound is known. 

A related notion has to do with the translation of an NBW to an NFW whose limit 
language is equivalent to that of the NBW. We studied also this notion, and charac- 
terized NBW that are finite-type, and for which a linear translation exists. We hope to 
relate finite-typeness with co-Biichi typeness, aiming at developing more techniques and 
understanding for approaching the NBW to NCW problem. 
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Abstract. This paper proposes a partial order reduction algorithm for timed trace 
theoretic verification in order to detect both safety failures and timing failures of 
timed circuits efficiently. This algorithm is based on the framework of timed trace 
theoretic verification according to the original untimed trace theory. Consequently, 
its conformance checking supports hierarchical verification. Experimenting with 
the STARI circuits, the proposed approach shows its effectiveness. 



1 Introduction 

Nowadays, the role of timed circuits have rapidly arisen in integrated digital circuit 
design. Thus, the verification of such timed circuits is imperative. But, the cost of timing 
verification is quite high. Several approaches have been proposed in order to reduce the 
average complexity of verification, i.e. symbolic methods based on BDDs and partial 
order reduction. Symbolic methods are difficult to efficiently apply to timing verification, 
yet partial order reduction is one of the most promising solutions to the state explosion 
problem, e.g. [1,2, 3, 4]. Hence, verification methods in which partial order reduction is 
well-suited are preferred. 

One direction is a timing analysis algorithm to validate correctness in the level-ruled 
Petri net [5]. Another direction is the simple timed trace theory based on time Petri 
net [6] . The partial order reduction is applied to both of them, but they are unable to 
hierarchically perform verification by using the conformation relation. 

In addition, [6] has no ability to verify liveness properties, i.e. only safety failures 
are detected. A framework of timed trace theoretic verification based on pseudo failure 
is proposed in [7], in which not only safety failures but also timing failures are ex- 
amined. The timing failure is a restricted form of a violation of the liveness property. 
Practically, detecting timing failure is adequate to verify timed circuits. Moreover, even 
if this approach certainly supports hierarchical structure as the original trace theory [8], 
it still differs from the original one in many points. Eventually, the framework of timed 
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** This research is supported hy SRC contract 2002-TJ-1024, and NSF Japan Program award 
INT-008728I. 

F. Wang (Ed.): ATVA 2004, LNCS 3299, pp. 339-353, 2004. 

0 Springer- Verlag Berlin Heidelberg 2004 




340 



D. Pradubsuwun, T. Yoneda, and C. Myers 



trace theoretic verification in accordance with the original one as well as the concept of 
semimodule and semimirror allowing conformance checking has been proposed in [ 9 ] . 

In this paper, we propose a partial order reduction algorithm for the verihcation 
method of the latest framework, and show its effectiveness through experimental results. 



2 Timed Trace Theoretic Verification 

Since our algorithm relies on the framework of timed trace theoretic verification proposed 
in [ 9 ], we briefly recall its idea in this section. The important notions are a module, a 
semimodule, and a timed trace structure. 

A module is a tuple (/, O, N, wire), where / is a set of input wires, O is a set of 
output wires (/ (T O = 0 ), fV = {P,T, F,lb,ub, fjP) is a time Petri net, and wire : 
T — / U O be a function from a set of transitions to a set of wires. We say that if 
wire(f) G I, t is an input transition, and otherwise, t is an output transition. In order to 
simplify the analysis algorithm, we assume that for each k G O, there exists at most one 
output transition t such that wire(f) = A: in a module. k{out) is used to represent such a 
transition t especially in the figures for simplicity. On the other hand, we allow multiple 
input transitions for an input wire k. Thus, k{in)i, k{in) 2 , ■ ■ ■ are used for such input 
transitions, while k{in) is used in cases that there exists only one corresponding input 
transition. Figure 1 shows examples of modules. A time Petri net is a Petri net except that 
each transition f of a time Petri net has two non-negative rationals, the earliest firing time 
lh{t) and the latest bring time ub{t), and that each enabled transition must bre within 
this time bound. A timed run /? = ctq cti ^ CT2 • of TV is a bnite or inbnite 
sequence of states and transitions such that (Tq is the initial state, and is obtained 
from CTi by passing some time and then bring transition f^+i. Its corresponding timed 
trace is 616263 • • •, where Ci = (wire(fi), r^) is an event and Ti denotes the time when 
the transition U bres. trace(A^) is a set of all timed traces generated by N . 

A semimodule is the same as a module, but the corresponding timed trace structure 
is distinctively debned, which is shown later. 

A timed trace structure of a module M = (/, O, N, wire), denoted by T(M), is a 
tuple (/, O, S, F), where S and F are sets of timed traces debned as follows. S, which 
is called a success trace set, is equal to trace(TV). F, which is called failure trace set, 
contains a trace y{w, r) ^ S, iff either 

- y G F, or 

- y G S,T < TL(j/, TV), w G I,or 

- y G S,T > TL(j/, TV), limit(j/, TV) C /. 

The brst condition is easy, i.e. any extension of a failure trace is also a failure trace. 
TL(j/, TV) is the latest time that the brings of all enabled transitions in TV can be postponed 
after y. For example, in Figure 1, TL(6, TVg) is 5 in Mq, and TL((a, 5), TVi) is 15 in TVfi. 

The second condition above states the case that the net can reach the time r with 
r < TL(y, TV), but cannot accept an input. It is considered as a failure. Note that if w is 
an output in this case, then y{w, r) is neither a success nor a failure, i.e. it is not possible. 
This difference comes from the fact that a net can control producing or not producing 
an output, but cannot control an input. 
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Fig. 1. Modules that cause a timing failure. 



Iimit(y, N) is the set of wires that correspond to the enabled transitions which deter- 
mine TL(t/, N). In the example above, limit(£, Nq) = {a, b}, because TL(£, Ng) = 5 
and both transitions determine this time. In the third condition, r > TL(j/, TV) implies 
that some transition causes a time-out from limit(j/, TV) C I, because an input cannot 
be controlled by the net. Hence, this must be a failure. Again, if limit(y, TV) contains 
an output wire, then y{w, t) is not possible, because an output is controlled by the net, 
and so, the net never reaches this time point. These considerations are summarized as 
follows, where P = S' U F is the set of traces that are possible. 

Cl: For w G / U O, and r G (the set of non-negative real numbers), 

1. S = trace(TV), 

2 . for y G S and y(w, r) ^ S, 

a) if T < TL(y, TV), then 

i. if w G I, then y{w, t) £ F 

ii. else y{w, t) ^ P 

b) else if limit(y, TV) C J, then y{w, t) G F 

c) else y{w, t) ^ P 

3 . for y G F, y{w, r) G F, 

4 . for y ^ P, y{w, r) ^ P. 

In order to define the correctness between modules, we use notions given in [8]. For 
Ti = (h,Oi,Si,Fi) and T2 = (F2, O2, S2, F2) such that Ii U Oi = I2 D O2, the 
intersection of 7 i and T2, denoted by 71 H T2, is a timed trace structure ( 7 i fl l2, Oi U 
O2, Si nS2, (Pi 0^2) U (Pi nP2)). Note that a trace is a failure of the intersection, only 
when it is a failure of one module as well as it is possible for the other. For considering 
the similar notion for modules with different wire sets, the following notions are needed. 
For a timed trace x = 616263 •• • and a set D of wires, 

del(P,x) = /^ if6i = (u:i,ri),u;iGP 
( eiy else, 

defines a timed trace obtained by projecting out the events on wires in D, where y = 
del(P, 6263 • • •). If X = e, then del(P, x) = {e}. For a set X of timed traces, del”^(P, 
X) is the set {x | del(P, x) G X}. For X whose elements do not contain any wires in 
P, deP^(P, AT) is the set of all timed traces that can be generated by inserting any event 
of D between any consecutive events in traces of X. This is extended for a timed trace 
structure T = { 1 , 0 , S, F) such that deP^(P, T) = {ILID, 0 , deP^(P, S'), del” ^(P, 
P) ) . Note that the inserted wires are always considered to be the inputs. The composition 
of 7 i and I2 is defined as 7i||72 = del”^(P2,7i) fl del”^(Pi,72), where Pi = 
(Ii UOi) — (/2 UO2) and P2 = (/2UO2) — ( 7 i UOi). From this, 7i||72 is failure-free, 
iff (del”\P2, Pi) n del'i(Pi, P2)) U (del”i(P2, Pi) n deP^Pi, P2)) = 0 . 
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The correctness between modules is defined as follows. Mi = (/i, Oi, wirei) 
conforms to M 2 = (/ 2 , O 2 , .^ 2 , wire 2 ), if Ii = I 2 , Oi = O 2 , and it holds that for 
any timed trace structure E = {Ie, Oe, Se, Fe) such that I 2 = Oe and O 2 = /b, if 
E\ \T{M 2 ) is failure-free, so is S||T(Mi)^. This conformation relation implies that Mi 
behaves similarly to M 2 with any environment with respect to failure-freeness. From 
this correctness definition, the following property is inherited. 

Theorem 1. {Mi, • • • , Mk-i,Mk^,- ■ ■ , Mk^,Mk+i, • • • , M„} conforms to Ms, if 
{Mfcj, • • • ,Mfe^} conforms to Mk, and {Mi, • • • , Mk-i,Mk,Mk+i,- ■ ■ , M„} con- 
forms to Ms- 

The proof is shown in [8]. This is what we call hierarchical verification. 

Practically, however, considering all possible timed trace structures E is infeasible. 
Instead, the mirror of T(M 2 ) can be used. Intuitively, it is a maximal E such that 
i? 1 1 T( M 2 ), and formally, for a timed trace structure T = (/, O, S, F), its mirror, denoted 
by T'",isalsoatimed trace structure (/', O', 5", F') satisfying/' = 0,0' = I,S' = S, 
and F' = A* — P, where A = lUO and A* is any timed trace over A. Then, it can be 
shown that Mi conforms to M 2 , iff T(Mi) ||T™(M 2 ) is failure-free [8], which is called 
the mirror property. In the untimed trace theory, for a module M = (/, O, N, wire), 
F™(M) is coincidentally equal to T(M') such that M' = (O, /, N, wire). This can be 
explained intuitively as follows. Suppose that T{M') = (O, /, S' , F'). First, S' = S, 
because the net is the same. In untimed systems, every trace y{w, r) satisfies r < 
TL(j/, N) because there is no upper bound. Thus, for y G S' , if y{w, r) ^ S" is a failure, 
then w must be an input of M', i.e. w G O. This is an impossible trace for M, because 
w is an output for M. Hence, F' = A* — P holds, and so T(M') = (O, I, S, A* — P), 
which is equal to T™ (M) by definition. From this fact, it is straightforward to implement 
conformance checking for untimed systems. 

Unfortunately, for timed systems, this construction of mirror trace structures is not 
correct (See [9] for details). Note that such mirroring is only necessary for a module 
representing a specification. Thus, in order to obtain a timed conformance checking 
algorithm similar to the one of untimed case, we introduce a slightly different version 
of a module for the specification. Since this is no longer a module of which timed 
trace structure is defined by Cl, we call it semimodule. A semimodule is also a tuple 
(/, O, N, wire), but its timed trace structure (/, O, S, F), denoted by 7^(M), is defined 
as follows. 

C2: For w G I U O, and r G R^, 

1.5 = trace(N), 

2. for y G 5 and y(w, r) ^ S, 

a) if T < TL(y, N), then 

i. if w G I, then y{w, r) G F 

ii. else y{w, t) ^ P 

b) else if limit(y, N) C O, then y{w, t) ^ P 

c) else y{w, r) G F 

* For this definition, the composition (||) can be replaced by the intersection (n), because the 
signal sets are common. Mi is, however, given as a set of modules in many cases where F(Mi ) 
is defined by the composition of the timed trace structures of the elements in Mi . Thus, using 
composition in the conformance definition is useful. 
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3. for y £ F, y{w, r) G F, 

4. for y ^ P, y{w, r) ^ P. 

The idea is to modify the case for r > TL(y, iV) such that F' = A* — P holds for 
P of a module (/, O, N, wire) and F' of a semimodule (O, /, N, wire). 

For a module M = (/, O, N, wire), we say that a semimodule {O, I, N, wire) is the 
semimirror of M, denoted by M®™. Consequently for timed systems, Mi conforms to 
M 2 iff r(Mi)||7;(M|'") is failure-free [9], 



3 Verification Algorithm 

In order to describe the algorithm, we first make several definitions as follows. The state 
space of time Petri nets is infinite, because each clock function takes real values. Thus, 
a set of inequalities is used to represent a number of different clock functions in order to 
obtain the finite representation of the state space. That is, a state of a time Petri net Ni 
is {yi,Ki), where y,i is a marking of Ni, and Ki is a set of inequalities over both past 
and future variables of transitions. For a transition t, its past variable, denoted also by t, 
represents its most recent firing time, while its future variable, denoted by i, represents 
its next firing time. The variables representing older firings are projected out from Ki. In 
the initial state, a virtual past variable v is used. It can be considered that v represents the 
time when the net is initialized, and that all of the initially enabled transitions become 
enabled at that time. 

Consider AI = {Mg, Mi, ..., M„_i} with Mi = (li, Oi,Ni, wire^) such that Mg is a 
semimodule, {Mi, M 2 , • • • , M„_i} is a set of modules, and Ni = {Pi, Ti, Fi, Ibi, ubi, 
Hi). Its state S is defined by a tuple of the local states, that is, S = (erg, ui, ..., cr„_i), 
where cti = (/ii, iTi) is a state of Ni. For a transition t G Ti, denotes the set of source 
placesoff, i.e., (f | {p,t) G Pi}. Similarly, the set of destination places of f is denoted by 
f*. The set of enabled transitions in Ni at Ui, and in AI at S' are denoted by enabled(cTi) 
= {t I t G Ti, C Hi} and enabled(S) = ur=0^ enabled(CTi), respectively. A Arable 
transition f in Ml is an enabled transition that can fire earlier than any other enabled 
transitions in A4, i.e. Solution(first(S, t) U Ur=Tg^ ^i) 7^ ® holds, where Solution(AT) 
is a set of feasible vectors of K and first(S, i) = {i < i'\t' G enabled(S)}. firable(S) 
denotes the set of Arable transitions in A4 at S. For an enabled transition t, Eft(f) and 
Lft(f) are tp + lb{t) and tp + ub{t), respectively, where tp is the past variable for a true 
parent of t, which is an output transition whose Aring made t enabled most recently. 
These expressions represent the lower bound and the upper bound of the next Aring time 
of t. 

Our goal is to decide whether a set {Mi, M 2 , • • • , M„_i| of modules conforms 
to a module Mg or not, which is achieved by checking whether F{M) = 7{(Mg)|| 
T(Mi)||T(M 2 )|| • • • ||T(M„_i) is failure-free or not, where Mg is the semimirror of 
Mg. For this purpose, the state space of Ni is traversed, and if every reachable state 
satisAes the following three conditions, then 7^(Mg)| |T(Mi)| |T(M 2 )|| • • • | |T(M„_i) 
is concluded to be failure-free. 

Condition 1 

For any output transition t G firable(S') of a (semi)module Mi with 0 < i < n — 1, 

and for any (semi)module Mj with 0 < j < n — 1 such that wirei(f) G Ij, 
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Fig. 2. Modules that cause safety and timing failures. 



there exists an enabled input transition u in Mj with wirey(rt) = wirei(f) such that 
Solution({Eft(u) > i} U U/^o^ ~ ® holds. 

Condition 2 

For any input transition t G firable(S') of a module Mi with 1 < i < n — 1, 
there exists an output transition u G firable(S') in some (semi)module such that 
Solution({ii > Lft(f)} U U/!To^ ~ ® holds. 

Condition 3 

For any input transition t G firable(S') of the semimodule Mq, either 

1. there exists an output transition u G firable(S') in a module Mi with 1 < i < 
n — 1 such that Solution({ii > Lft(f)} U Ur=To^ ~ ® holds, or 

2. there exists an output transition u G firable(S') in the semimodule Mg such that 
Solution({ii > Lft(f)} U Ur=o^ ^ holds. 

Intuitively, an output produced by a (semi)module must be accepted by some other 
(semi)module, otherwise a failure exists. The condition 1 checks this kind of failure, 
which we call a safety failure. A state that causes a safety failure is called the safety 
failure state. Consider the example shown in Figure 2(a), where t is h{ouf) of M\ and u 
is b{in) of M 2 . Suppose that both Mi and M 2 are modules. For timed trace y where a 
transition t is enabled in its last state, let ep(f, y) denote the earliest bring time point of t 
in y, i.e., the value of Eft(f) obtained by assigning tp to its actual bring time in y. Ip(f, y) 
is debned similarly. If ep(f, y) < ep(u, y) holds, then y{b, r) G Pi and y{b, r) G Fj for 
ep(f,y) < T < ep{u,y), where b = wirei(f) = wirey(u). Hence, y{b,T) is a failure 
of T{Mi)\\T{Mj). In this example, ep{b{out),e) = 6, ep{b{in),e) = 8, and so, for 
example, (6,7) G Pi and (6,7) G P 2 - Thus, (6,7) is a failure in T(Mi)||T(M 2 ). In 
this case, such modules have a feasible solution in {Eft(u) > (} U Ur=o^ 
condition 1 does not hold. 

On the other hand, an input expected by a (semi)module must be given in time by 
some other (semi)module, otherwise a failure occurs. The condition 2 and condition 3 
check this kind of failure, which we call timing failure. A state that causes a timing failure 
is called the timing failure state. Consider an example shown in Figure 2(b), where t is 
b(in) of Ml and u is a{out) of M 2 . Suppose that both Mi and M 2 are modules, and 
so, this is handled by condition 2. For timed trace y, suppose that lp(f, j/) < \p{u,y) 
holds and \p{u,y) is the smallest latest bring time point among the enabled output 
transitions. Then, y{b,T) G Fi and y{b,r) G Pj for \p{t,y) < r < lp(u, y), where 
6 = wirei(f). The former holds because limit(y,iVi) C I holds from the assumption 
and Cl. Hence, y{b, r) is a failure of F (Mi) \ \F{Mj) . In this example, \p{b{in),e) = 5 
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Fig. 3. (a) ti disables tz- (b) ti hides a timing failure. 



and lp(o(ouf),£) = 10, and so, for r = 9, (6,9) G Fi and (6,9) G P 2 - Thus, (6,9) 
is a failure in T(Mi)||T(M 2 ). In this case, such modules have a feasible solution in 
{ii > Lft(f)} U Ur=To so, condition 2 does not hold. 

The condition 3.1 is for the case where t is an input transition of a semimodule Mq 
and u is an output transition of some other module such that Ip (tt, y) is the smallest latest 
firing time point among the enabled output transitions. This case can be handled in the 
same way as the condition 2. 

The condition 3.2 is for the case where output transitions with the smallest latest 
firing time point exist only in Mq. In this case, we need to further consider a special case 
where such an output transition u has the same latest firing time point as t, i.e., for timed 
trace y, lp(w, y) = lp(f, y) holds. Consider the example shown in Figure 1 again where t 
is a{in) of Mg and u is b{out) of Mq, but suppose Mq is a semimodule this time. Since 
Mq is a semimodule and limit(y, Nq) contains the input transition t, y{a, r) G Fq holds 
fromC2 for lp(f, y) < r < r',wherea = wireo(f) and t' is the smallest lp(u',y) among 
the enabled output transitions m' in modules Ml , ..., M„_i. y(a, r) G Pi for modules M^ 
with 1 < i < n — 1 from r < r'. Hence, y{a, r) is a failure of T (Mf). In this example, 
lp(a( in),e) = \p{b{out),e) = 5 hold in Mq, and lp(a(oM), e) = 10. Thus, for r = 6, 
(a, 6) G Fq and (a, 6) G Pi, and so (a, 6) is a failure in T(Mo)||T(Mi). Thus, the 
condition 3.2 correctly handles this case by modifying the equality in {ii > Lft(f)} of 
the condition 3.1 to (ii > Lft(f)}. 



4 Partial Order Reduction for Timed Trace Theoretic Verification 

In this section, the idea of partial order reduction for timed trace theory is proposed. 
Also, the difference between the proposed idea and [6] is discussed. The concept of 
partial order reduction is to generate some subset of possible successor states as long as 
the correctness is not affected. We call a state space generated according to this principle 
the reduced state space. For timed trace theoretic verification, the reduced state space 
Gr = (Sr,Rr) should be constructed such that it satisfies the following conditions [PTl], 
[PT2], and [PT3] with respect to the full state space G/ = (Sf,Rf), where Sr and Sf are 
the sets of reachable states for a set of (semi)modules, and Rr and Rf are the transition 
relations between states (PJ denotes the transitive closure of Py). A transition relation 
is a set of tuples (s, t, s') such that s' is obtained from s by firing an output transition t 
and its corresponding input transitions. 
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[PTl] Sr includes the initial state, and for s G 5 '^, if s has successors in Gy, then s has 
at least one successor in Gr- 

[PT 2 ] For se Sr, if 

- (s,fi, si) G Rf, and 

- there exists a sequence rj such that (s, r], S3) G Rp (53, s”) G Rf, s') 

G R*f, rjts = t2 ... (if ry is empty, then ^2 = ^3), and the firing of ^2 is necessary 
for the firing of t^, but 

- ^3 is not enabled in s' (see Fig. 3 (a)), 
then (s, fi, si) G Rr implies (s, t2, S2) G Rr- 

[PT 3 ] For s G Sr, if 

- (s,fi, si) G Rf, and 

- there exists a sequence rj such that (s, 77, S3) G R*f,rj = t 2 - ■ ■ , and S3 has a timing 
failure, and 

- (si, ?7, s') G R*f, but 

- states from si to s' along ry have no timing failure (see Fig. 3 (b)), 
then (s, fi, si) G Rr implies (s, t2, S2) G Rr- 

[PTl] is vital, because a new deadlock state must not be introduced in Gr- [PT 2 ] 
is for handling conflicting transitions. This is depicted by Figure 4 (a). In Figure 4 (a), 
since a firing of ti immediately disables t^, if Gr has only a successor by G in s, then 
we miss the firing of t^- Thus, [PT 2 ] fires t2 which is equal to ^3 in this case (ry = e). 
Moreover, a conflict may occur indirectly as shown in Figure 4 (b). Suppose that ti is 
in conflict with ^3, but is not enabled in s. In this case, the firing of <1 does not 
disable directly. However, the firing of which becomes possible when firing t2 
earlier than t\, may be missed, if Gr contains only the firing of tx in s. Thus, it must 
also contain the firing of t2 in order to retain the possiblity that the firing of occurs. 
[PT 3 ] is for handling transitions hiding timing failures. Such a transition is an enabled 
output transition that has the larger latest bring time point than the others. For example, 
consider the modules illustrated in Figure 5 . If b{out) bres at 10 , then a timing failure 
occurs in the resultant state, because a{out) can bre later than lp(a(m), {b, 10)) = 20. 
On the other hand, the current state is not a failure state, because b{out) always bres 
earlier than \ p{a{in) , e) = 20 . Also note that a{out) is brable in this state. Therefore, if 
a(out) is bred in this state, the above timing failure is never detected. In other words, 
a{out) hides the timing failure, and it corresponds to t\ of Figure 3 (b). Hence, [PT 3 ] 
forces b{out) to bre, if a{out) is chosen for bring. 
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Fig. 5. a{out) hides a timing failure. 



Due to the ignoring problem [1], we require that time certainly passes in any loop 
structure in any time Petri net in the module set and the latest firing time of each output 
transition is bounded. This is necessary to prove the correctness of the partial order 
reduction algorithm. 



5 State Enumeration 

This section briefly describes how to traverse the reduced state space of a set of (semi)- 
modules. The main idea is similar to that of [6]. Recall that we defined a sfafe of a 
set of (semi)modules by S' = (ctq, cti, ..., where ai = {ni,Ki) is a state of 

Ni. Here, we modify this definition a little for easier presentation. Let {iVo, ..., 
denote a set of time Petri nets that compose the set of (semi)modules, where Ni = 
{Pi,Ti, Fi, Ibi, ubi, /r°). We define the state of {Nq, ..., by (/i, K), where /i C 

Ur=To^ Pi is a marking of all the nets and PT is a set of inequalities. 

The initial state of •■•j W„-i} is (Ao; Kg), where 

- Mo = Ui=o Mi... 

- Kq = {”lb{t) <i — v< ub{t)”\ t is an output transition such that t € enabled(/io)} 

Recall that t in Kq is the future variable used to represent the next firing time 
of an oufput transition t, and u is a virtual variable to synchronize the nets at the 
initial state. Note that we only consider the output transition variables for the state 
space enumeration, because an input transition t of Mj = {Ij,Oj,Nj,Vj\rej) syn- 
chronizes with OUtJrans(f), where OUtJrans(f) is the output transition that cor- 
responds to t, i.e., outJrans(f) = t' such that wirei(M) = wirej(f), t' £ Ti, 
N, = {Pi,T„F,Jbi,uh,^i^i), wirei(M) G Oi, and M, = (/i, Oi, N,, wire*). Fur- 
thermore, we extend this notation by defining OUt_trans(f:) = t for an output transition 
t. enabled is extended for /t, that is, enabled(/t) = enabled(/r*). 

Then, the set of successor states is obtained from a state s — {jj,, K) by firing an 
output transition t. The output transition t is chosen from a set ready(s). The ready(s) is 
the smallest possible set of Arable transitions that satisAes [PT1],[PT2] and [PT3]. The 
ready set construction is described in the next section. 

Sometimes more than one successor state s' = {jj,' , K') can be produced from a 
state s and an output transition t G ready(s). Those successors have the same marking 
part jl' satisfying fi' = Ur=o^ Mi such that 

, _ J (Mi - •t') U f'* if Ni has t' G syncJrans(s, t), 

( fXi otherwise, 
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where sync_trans(s, t) be the set of enabled transitions that hre synchronously with t, 
i.e., G enabled(/i), out_trans(t') = t}. 

To obtain the inequality part K' of each successor, we hrst define Ji as follows. 

Ji = K U {i < i'\t' G ready(s)} U {f = i} 

This expresses the constraint that the bring time of t is smaller than those of any other 
transitions in ready(s). Since t bres, its future variable t is copied to its past variable t. 

Let conflict(u) be the set of transitions conbicting with u, i.e., {u'\ • u fl •u' ^ 0}. 
Furthermore, for a set U of variables and a set K of inequalities, K' = delete(iT, U) is a 
unique set of inequalities over var(iT) — {7, such that the solution set of K' is equal to the 
solution set of K, projected on var(iT) — U, where var(iT) denotes the set of variables 
appearing in K. Since the transitions u in sync_trans(s, t) also bred at the bring of t, 
the enabled output transitions t' G conflict(u) for u G sync_trans(s, t) are disabled by 
the bring of t. Thus, the variables for those disabled transitions are no longer necessary, 
and the following J2 is obtained by deleting those variables for the disabled transitions. 

J2 = delete( Ji, {t' | t' is an output transition in enabled(/t) fl conflict(u) 
foru G sync_trans(s, f)}) 

Next, the true parents of newly enabled transitions are determined. Let E{s, s', t) = 
denote the set of newly enabled output transitions in a successor s' of 
s by t. Note that it includes the transitions that become enabled by the brings of input 
transitions in sync_trans(s, f). Consider a transition L G E{s,s',t). If L has more 
than one source place, i.e. \ • ti\ > 1, we need to decide among the candidates X{ti) 
of the true parent of ti which transition really enables ti, where X{ti) = {x\x G 
out_trans(* • ti) fl var(J2)}. Since we have choices of a true parent for each newly 
enabled transition in E{s,s' ,t), we have successors which correspond to the possible 
combinations of true parents. Here, consider one such combination {ui,U2, ■■■, Um) for 
E{s,s',t) = {ti,t2, where Ui G X{ti) is the true parent of L. The timing 

constraint needed for it is that ui can bre later than the other true parent candidates. That 
is, 

•h = J2^ [J {x <Ui\x & X(ti)} 

l<2<m 

Finally, 

J4 = J3 U [J {lb{U) <ii-Ui< ub{ti)} 

l<2<m 

decides the bring time of L based on Ui, lb{ti), and ub{ti). This J4 has all the necessary 
information for the inequality part of s'. It, however, still contains some unnecessary past 
variables, and deleting them is necessary to make the state space bnite. A past variable 
u is necessary, if the transition u has some chance to become a true parent of some 
currently disabled transition t such that u • fl • f 0, or the transition tt is a true parent 
of some enabled transition. The former condition is considered in [6], but the latter is 
a new condition needed in our method because a transition which is a true parent of 
some enabled transition must be kept ^ in order to detect the safety and timing failures 

^ In the actual implementation, in order to reduce the number of variables, input transitions with 
[0, cxd] are handled in the same way as [6] instead of keeping their tme parents. 
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as mentioned before and construct the ready set which is described in the next section. 
Therefore, the set D of unnecessary past variables can be defined as follows. 

D = {v \ for every u G Y{v) and x G Z(u), either (1) u • D • x ^ fi' , or 

(2) X ^ enabled(/i') and no_parent('u, x, fi' , J 4 ), or 

(3) X G enabled(/i') and ub{x) = 00 and no_safety_failure(x, fi' , J 4 )}, 

Y{v) = {u\v = out_trans(w)}, 

Z{u) = {x\ X is an output transitions such that m • iT • a: 0}, 

where no_parent(tt, x, fi, J) is a predicate representing that out_trans(tt) can never be- 
come a true parent of the transition x under /t and J. Formally, it is true, iff for some 
p G *x — u* —fi, and for every pair (y, w) with w G *p and y G enabled(/i), Solu- 
tion({out_trans(u) > y + diff(y, w)}UJ) = 0, where diff(fi , ^ 2 ) represents the minimal 
values of sum of earliest firing times in all paths (input transitions are linked to their cor- 
responding output transitions and the earliest firing times of output transitions are used.) 
from an output transition ti to another output transition t 2 , with /6(fi ) not included. And, 
no_safety_failure(x, /t, J) is a predicate representing that a safety failure can no longer 
occur due to an input transition x, and so, it is not necessary to keep the true parent 
of X in order to check the safety failure. Formally, it is true iff Vf G enabled(/t) (T O, 
Solution({Eft(x) > i} U J) = 0 holds. Flence, the inequality part K' of a successor 
state of s by f is obtained as K' = delete( J 4 , D). 

The state space is traversed in the depth first search manner checking safety and 
timing failures as long as a new successor state that is not covered by any other reached 
states is obtained, where a state s = (/t, K) is covered by state s' = (/t', K'), if p = fi' 
and the solution set of Ff is a subset of that of K'. 

6 Ready Set Construction 

A ready set must be constructed according to the idea of the partial order reduction for 
timed trace theory mentioned in Section 4. Thus, all transitions in the ready set must 
satisfy [PTl], [PT2], and [PT3]. For [PTl], some Arable output transition is included 
in the ready set, if it exists. As for [PT2], if, for example, ti is in the ready set, and 
there exists a disabled transition which is in conflict with ti, then the ready set must 
include a Arable output transition whose Aring is necessary to enable in the future. In 
the example shown in Figure 4(b), t 2 is such a transition. Furthermore, if a{out) is in the 
ready set in Figure 5, then b{out) must be included in the ready set from [PT3]. We call 
such b{out) a limiting transition. That is, a limiting transition is a Arable output transition 
such that its latest Aring time point is smallest among the Arible output transition of all 
(semi)modules. 

The ready set construction starts from Anding a limiting transition. First, consider 
limiting(s) = {t\Vu G firable(s) — {f}, 

n—1 

Solution({f: > Lff(out_trans('u))} U [J Ki) = 0}. 

i=0 

The transitions in limiting(s) have certainly smaller latest Aring time points than the 
other Arable transitions. In the example of Figure 5, limiting(so) = {b{out)}. If it is 
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Fig. 6. No Arable output transitions are chosen by limiting(s). 



nonempty, any singleton subset of limiting(s) can be used as a seed of the ready set, 
because firing such a transition does not hide the timing failure anyway. This condition 
is, however, sometimes too strong. For example, in the net shown in Figure 6 , neither 
a{out) nor c{out) satisfies this condition, i.e., limiting(s) = 0. This is because the state s 
does not distinguish situations obtained by different firing times of b{out), i.e., if b{out) 
fires before time 5, then c{out) has smaller latest firing time point than a{out), and 
otherwise, a{out) has smaller latest firing time point than c{out). We solve this problem 
in a conservative way such that if limiting(s) is empty, then the set of all firable output 
transitions is used as a seed of the ready set. 

The seed of the ready set obtained in this way satisfies [PTl] and [PT3]. In order to 
satisfy [PT2], the seed should be extended such that for any transition t in the set, the set 
includes the dependent set of t, where the dependent set is defined below. Note that this 
process is the same as the one presented in [ 6 ], and so, only its intuitive idea is described 
here. The details are shown in [ 6 ]. 

The necessary set for a transition t is {f}, if t is enabled. Otherwise, it is a set of 
enabled transitions such that t can never be enabled if none of those transitions is fired. 
For example, the necessary set for is { 1 : 3 } for Figure 4(a), and {^ 2 } for Figure 4(b). 
The dependent set dependent(s, t) for a transition f in s is a set of transition that satisfies 
(1) t € dependent(s, t), and (2) if m G dependent(s, t), then the necessary sets for all 
the transitions that conflict with both u and those synchronized with u are included in 
dependent(s, t). This set contains transitions that should be fired when t is fired in s. 
For example, the dependent set for is {^ 3 } for Figure 4(a), and {^ 2 } for Figure 4(b). In 
the latter case, if the path from ^2 to takes a long time, and cannot actually conflict 
with ti, then ^2 does not have to be in the dependent set. Thus, the sum of the earliest 
firing times of transitions in such a path can be used to decide this possibility. Also note 
that dependent(s, t) may contain nonfirable transitions. In such a case, all firable output 
transitions are used for dependent(s, t) again conservatively. 

Moreover, if some (semi)module contains independent loop structure, our algorithm 
may not terminate [ 6 ]. This is because the time differences of concurrent transitions 
increase without converging. This situation can be detected by checking that the time 
differences exceed some constant value. In such a case, again the set of all firable 
transitions is used for the ready set, meaning that the algorithm temporally reverts to the 
full state space enumeration. 




Partial Order Reduction for Detecting Safety and Timing Failures 



351 




Fig. 7. Specification and environment of STARI circuit. 



7 Experimental Results 

To show the performance of the proposed method, we have implemented it within a tool 
VINAS-P [6]. This section demonstrates the proposed method with the STARI circuits 
[ 10 , 11 ]. 

The STARI circuit is composed of a number of FIFO stages. A two-stage STARI 
circuit is shown in Figure 7(a). These gates are modeled hy the time Petri nets. In [1 1], the 
verification of this circuit with respect to the following three properties is demonstrated: 
(1) ackl goes low (i.e. the current data is moved to the next stage.) earlier than the 
separator is given in {xOt, xOf), (2) a new data is ready in {x2t, x2f) earlier than ack3 
goes low (i.e. the receiver samples the data.), and (3) no hazard occurs at any gate. In 
addition to these properties related to the event ordering, we verify the following timing 
properties for n-stage STARI circuits with n > 3 in this experiment: (4) ackl goes low 
within 4 time units after a new data is sent from the sender and at least 7 time unit before 
the next separator is sent, (5) a new data is ready in (x2t,x2f) within 2 time units after 
ack3 goes high and at least 9 time unit before ack3 goes low again. To express these 
properties, we use the time Petri net shown in Figure 7(e). 

Here, the experiments have been done on a 2.8 GHz Pentium 4 workstation with 
4 gigabytes of memory. We have verihed the STARI circuits by using the total order 
method and the partial order method to compare their performances, where in the total 
order method, the ready set contains all Arable output transitions. Moreover, we have 
hierarchically verihed these circuits with the partial order method as well. In this exper- 
iment, we hrst verify a one-stage STARI circuit with its specihcation shown in Figure 8. 
Note that this specihcation is obtained by analyzing the behavior of the one-stage STARI 
circuit. The verihcation of such a sub-circuit and its specihcation should be done for 
stages with different initial markings. Once those verihcations succeed, every stage- 
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Fig. 8. Sub-circuit and its specification for hierarchical verification. 
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Fig. 9. CPU times for verifications of STARI circuits. 



circuit is replaced by its corresponding specification, which is much smaller than the 
circuit model, and it is verified that the set of those sub-specifications conforms to the 
original specification (Figure 7(e)). 

Figure 9 shows the CPU times for verifications of n-stage STARI circuits where the 
x-axis shows n. Note that "Partial(hierarchical)" includes the verification runs for sub- 
circuits. These results show that the performance improved by partial order reduction is 
significant, and the hierarchical verification is much more powerful. One disadvantage 
of the hierarchical verification is that sufficient sub-specifications should be prepared by 
users. 

In addition to these experiments, we have run the XOR chain example in [12] to 
compare the proposed method with Minea’s work. According to our results, the example 
is very sensitive to the delay bounds of the XOR gates, which is not specified in his thesis. 
Thus, fair comparison is not easy. One fact is that the total order method outperforms 
our partial order method in this example, although it has some amount of concurrency. 
This is probably because this example contains many almost independent loops, which 
makes the visited state checking difficult in the partial order method (the details of the 
independent loop handling can be found in [6]). 
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8 Conclusion 

In this paper, we have proposed a partial order reduction algorithm for a timed trace 
theoretic verification. Our algorithm can hierarchically verify timed circuits and detect 
a kind of liveness failures (i.e. timing failures). Experimental results obtained from the 
STARI circuits by using the partial order reduction show the effectiveness of the proposed 
method. We are planing to do a case study to verify a practical system for showing the 
usefulness of the proposed method. 
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Abstract. When engineers design a system, there is always a question 
about how exhaustive the system has been examined to be correct. Cov- 
erage estimation provides an answer to this question in testing. A model 
checker verifies a design exhaustively, and proves the satisfaction of prop- 
erty specifications. However, people have noticed that design errors exist 
even after model checking is done, which goes to show that the question 
“How complete is the model checking once done?” is still left relatively 
unaddressed by model checkers, except for some state-based coverage 
metrics and the coverage estimator for symbolic simulation in RED. As 
a more complete solution, we propose several structural mutation models 
and coverage metrics to cover different design aspects in a state graph 
and to estimate the completeness of model checking, respectively. Once 
a system state graph satisfies a given set of property specifications, we 
estimate the coverage of completeness for the set of properties by ap- 
plying some mutations to the state graph and checking if the given set 
of properties is sensitive to the mutation. Our experiences on five ap- 
plication examples demonstrate how the proposed coverage estimation 
methodology helps verification engineers to find the uncovered hole. 



1 Introduction 

A model checker explores all possible states of a system model and proves if 
it satisfies a given set of property specifications. If a system does not satisfy 
a property specification, a model checker will provide a counterexample, that 
is, a system computation to show how it ran into a wrong state. With the ca- 
pability to exhaustively verify a system, this method has successfully verified 
complex circuit designs and communication protocols. However, model checking 
still faces some obstacles, we are not sure if a design will function correctly even 
after model checking. At the worst, we may have checked a model satisfies a vac- 
uous property, which does not check anything meaningful. Take as an example, a 
property AG(req — >■ AF (granted)) to check if a model is starvation-free. If there 
are no requests issued in a system model, this property is indeed true. Though 
we can detect such vacuous properties [3] , but how do we know whether a given 
set of properties fully verifies the whole model. Due to this need, Hoskote et al. 
[9] defines a state coverage metric in symbolic model checking. The computa- 
tion algorithm is based on mutating an observed signal at a certain state, and 
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this state is covered if the truth values of properties are changed. Some other 
research work [5,6,7,10] are all based on this state coverage metric, which do not 
consider the computation runs or behavior of a system model. To obtain a more 
comprehensive coverage, we propose coverage metrics that not only considers 
the individual states, but also takes the behavior of models into consideration. 

Coverage has been widely applied in simulation and testing in the recent few 
decades. Various metrics have been proposed and applied to real world cases to 
assess progress of the verification. A low coverage will allow lots of interesting 
corner cases to escape, which always leads to functional failure. On the contrary, 
100% coverage means that the corresponding kind of error is fully verified by the 
given test patterns. Model checking faces the same situation, the given properties 
may overlook some desired behaviors of a system. Based on our verification 
experiences, we know there are lots in common between simulation and formal 
verification. Because of the similarity, we can adapt existing test based coverage 
techniques to ensure the quality and reliability of model checking. Some prior 
researches also tried to apply coverage estimation to formal verification. Some 
mutation metrics we proposed here are inspired by coverage metrics in testing. 

In contrast to state coverage metrics [5,6,7,9,10] being static estimates on 
model structure, our proposed metrics estimate the coverage of behavior for a 
system design model. Six mutation models and coverage metrics are proposed 
to address different aspects of a system design. We implement the six metrics 
for coverage estimation, and integrate them into overall coverages. We provide a 
comprehensive estimation of the target model with respect to the specifications. 
We can prove that the proposed metrics are suitable for analyzing the model 
checking efforts. 

The remaining portion is organized as follows. Section 2 gives the coverage 
approaches in simulation and pioneer researches in improving quality of formal 
verification. Some related definitions are given in Section 3. Section 4 will for- 
mulate each of our metrics and show how the values of coverage of metrics are 
calculated. The overall estimation methodology is also described in this section. 
In Section 5, we apply the new methodology to some real cases, we employ cov- 
erage analysis to find the uncovered holes and specify more detailed properties 
to uncover design holes. The article is concluded and future research directions 
are given in Section 6. 



2 Previous Work 

Coverage estimation has been used to assess the quality of design and imple- 
mentation verification for a long time. For both software and hardware, we feed 
test cases into a program or a circuit and observe if outputs are correct. After 
test for expected scenarios, most verification engineers think the verification is 
done and prepare to tape out. However, due to design getting more and more 
complicated, more unexpected errors escape from verification. Some of them 
cause huge financial damage, even threatening peoples’ lives. To achieve more 
reliable verification, coverage estimation is employed in general. Coverage pro- 
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vides a quantitative way to assess how thorough is the verification we have done. 
Coverage-driven testing has been the main stream of modern hardware simula- 
tion, which use coverage to direct the generation of test vectors for exercising 
untested functions. Many coverage metrics have been been proposed and applied 
in simulation. Each of them have been created for different needs. 

Code coverage [13] measures how complete a piece of HDL code has been 
exercised by a given set of test patterns. For example, statement coverage cal- 
culates the percentage of lines of code stimulated during simulation, so that 
designers can redirect the generation of test suite or modify the HDL program 
to gain higher coverage. However, the interaction between modules, simultaneous 
events, and sequences of events are not evaluated. In spite of these drawbacks, 
this kind of metrics is simple and easy to understand, verification engineers treat 
them as the basic requirements of design validation. 

Fault simulation [14] is widely employed in gate-level circuit testing. It esti- 
mates the fault coverage with respect to several kinds of fault model. By simu- 
lating the test pattern upon faulty models, we can calculate how many injected 
faults are detected by comparing the output to the original one. Fault coverage 
is calculated as the proportion of detected faults to injected faults. Some gate- 
level fault models can also be applied to higher-level designs [1,12]. We exploit 
the principle in these techniques in our proposed formal approach. Experience 
in testing shows that fault simulation is an effective way to reflect the target 
fault and directs test pattern generation. Applying the adequate fault models 
can make sure that interested faults are under verification. 

There are some other prior approaches proposed to address the same problem 
in formal verification. Vacuity detection [3] of the specification provides an an- 
swer on the validity of individual properties. Some properties are trivially true 
like antecedent failure. In their experience, about 20 percent formulae passed 
vacuously during the first verification. Vacuity detection can avoid those mean- 
ingless properties to decrease verification load. Another approach is to estimate 
the coverage of a specification with respect to a system model [9,11]. Katz et al. 
[11] suggested that the reduced tableau of a set of ACTL properties should be 
bisimilar to the implementation. The relevance between the implementation and 
the specification is compared to estimate verification coverage and to find out 
the incompleteness of the model or the insufficiency of the specification. Hokote 
et al. proposed a state coverage metric, which was inspired by mutation cover- 
age [17]. They apply a mutation on an observable signal in a particular state. A 
state is covered if the mutation leads to the violation of a given property. The 
authors of [10] improve the algorithm from [9] to be more efficient and general. 
The seminal work of Hoskote et al. [9] was also extended to LTL model checking 
[5], to full CTL model checking [6], and to the simulation of specifications [4]. 
However, all these work are based on the state coverage metric. Chockler et al. 
[7] made a brief survey of several different kinds of coverage metrics for formal 
verification, but did not address the practicality of the metrics, nor provide any 
application examples to show their usability. 
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Wang et al. proposed several numerical coverage metrics for the symbolic 
simulation of real-time systems [16]. Four criteria were introduced for analyzing 
coverage metric properties. Coverage metrics were also proposed for region spaces 
in that work. The coverage estimator for symbolic simulation was implemented 
in the RED model checker for real-time systems. 

In this work, we propose a formal coverage estimation methodology for ana- 
lyzing the completeness of a set of specification properties in the model checking 
of real-time systems. We use a mutation-based approach where a state-graph 
model is mutated and it is checked if a given set of properties can distinguish 
the mutation. Six different mutation models and corresponding coverage metrics 
are proposed. In contrast to conventional state-based metrics, we also propose 
behavioral (transition-based) metrics, which is similar in some respects to that 
proposed in [16], except that it is used for estimating the model coverage by 
a set of properties. The proposed mutation models and coverage metrics were 
implemented in the State- Graph Manipulators (SGM) model checker [15]. 



3 System Model and Specification 

Our system model with real-time clocks is based on the timed automata model 
[2], which is defined as follows. 

Definition 1. Mode Predicate 

Given a set C of clock variables and a set D of discrete variables, the syntax of 
a mode predicate rj over C and D is defined as: p := false \ x^c\x — y^c\ 
d ^ c \ rji A r ]2 \ ~'Vi, where x,y G C, ^ G {<, <, =, >, >},c G Af, d G D, and 
r]i , rj 2 are mode predicates. 

Let B{C,D) be the set of all mode predicates over C and D. 

Definition 2. Timed Automaton 

A Timed Automaton (TA) is a tuple Ai = Di, Li,Xi,Ti, pi) 

such that: Mi is a finite set of modes, G M is the initial mode, Ci is a set 
of clock variables, Di is a set of discrete variables, Li is a set of synchronization 
labels, and e G Li is a special label that represents asynchronous behavior (i.e. 
no need of synchronization), Xi ■ ^i B{Ci,Di) is an invariance function 
that labels each mode with a condition true in that mode, Tj C Mi x Mi is 
a set of transitions, Xi : Ti Li associates a synchronization label with a 
transition, Ti \ Ti ^ B{Ci,Di) defines the transition triggering conditions, and 
Pi Ti i-G 2Gu(£*ixM) jg assignment function that maps each transition to 
a set of assignments such as resetting some clock variables and setting some 
discrete variables to specific integer values. 

A system state space is represented by a system state graph as defined in 
Definition 3. 

Definition 3. System State Graph 

Given a system S with n components modeled by Ai = {Mi, m°, Ci, Di, Li,Xi,Ti, 
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Xi,Ti,pi), 1 < i < n, the system model is defined as a state graph represented 
by All X ... X An = AI5 = (M, C, D, L, y, T, X, r, p), where: 

— M = Ml X M 2 X ... X Mn is a finite set of system modes, for a system mode 

TO = toi.to -2 TO„ € M, we use the shorthand m(i) to denote the mode 

mi. 

— = toi.to -2 m^ G M is the initial system mode, 

— C = IJj Ci is the union of all sets of clock variables in the system, 

— D = [J^ Di is the union of all sets of discrete variables in the system, 

— L = IJj Li is the union of all sets of synchronization labels in the system, 

— X : M S(1J. C'i,U- A), x("i) = XiXtirrii), where to = TO 1.TO2 to„ G 

M. 

— TCMxMisa set of system transitions which consists of two types of 
transitions: 

• Asynchronous transitions: 3z, 1 < i < n, Cj G Tj such that Ci = e G T 

• Synchronized transitions: 3i,j, I < i ^ j < n,Ci G Ti,ej G Tj such that 
Ai(ei) = {I, in), Xj{ej) = (I, out), I G LiC\ Lj ^ e G T is synchroniza- 
tion of Ci and tj with conjuncted triggering conditions and union of all 
transitions assignments (defined later in this definition) 

— A : T I— >■ L associates a synchronization label with a transition, which repre- 
sents a blocking signal that was synchronized, except for e G L. 

— T : T i-G B{\j^Ci,\J^Di), r(e) = Ti(ej) for an asynchronous transition and 
r(e) = Ti{ei) A rAcj) for a synchronous transition, and 

— p : T i-G- p(e) = pi{ei) for an asynchronous transition and 

p(e) = pi{ei) U Pj{cj) for a synchronous transition. □ 

For hardware and software designs, a property specification is usually ex- 
pressed in some temporal logic. The SGM model checker chooses TCTL as its 
logical formalism, as defined below. 

Definition 4. Timed Computation Tree Logic (TCTL) 

A timed computation tree logic formula has the following syntax: 

cj)::=r]\ EGej)' \ \ ^(f)' \ <j)' V 0', 

where 77 is a mode predicate, 0' and 0" are TCTL formulae, ~ G {<, <, =, >, >}, 
and c G Af. EG(f>' means there is a computation from the current state, along 
which 0' is always true. E4>'U^c(t>'' means there exists a computation from the 
current state, along which 0' is true until 0" becomes true, within the time 
constraint of ^ c. Traditional shorthands like EF, AF, AG, AU, A, and — >■ 
can all be defined [8]. 

4 Mutation Coverage Estimation 

Coverage estimation techniques for formal verification such as model checking 
are not as mature as that for simulation-based verification. A well-studied metric 
is a purely state-based one [4, 5, 6, 9], where an observed signal in a state is Hipped 
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(value toggled) to check if the satisfaction of any user-given property is affected 
(model checking result toggled) . If the model checking result differs after toggling 
a signal value in a state, then the state is said to be covered. Intuitively, besides 
an observed signal in a state, there are other basic elements in a state-transition 
model that also needs to be covered [7,16]. This work provides some basic insights 
to the coverage estimation techniques based on mutating other elements of a 
system model. 

This work is based on mutation coverage, which makes changes to a system 
model and then checks if a given property suite can detect those changes. Mu- 
tating a system model can be done is two ways: (1) Semantic Mutation, that is, 
changing the value some basic elements in the model, e.g., variable values in a 
state, clock resets along a transition, etc., and (2) Structural Mutation, that is, 
changing the structure of the model, e.g., inserting or deleting a state or a tran- 
sition. Currently, our work is focused on structural mutation for timed automata 
with timed computation tree logic properties. 

Similar to the conventional fault models in simulation-based testing, we pro- 
pose structural mutation models that can be applied to system state-space rep- 
resentations such as state graphs. Henceforth, unless explicitly mentioned, we 
will simply use mutation models to denote structural mutation models. 

A mutation model is a small change that is applied to some basic element 
of a system model such as a mode or a transition of a state graph. Given a 
state-graph A = {M,m^ ,C, D, L, x,T, X,Tj p) and a mutation model ^ that is 
applicable to some basic element b £ MUT, the resulting state-graph is called a 
mutated state graph and is denoted as = (M', m'°, C, D, L, , T' , A', r', p'). 
If , m'° ^ 4> for some 4> £ P then the mutation p{h) is said to be covered 
by (f) and the mode or transition b is said to be covered by (j). A property (f> is 
said to be insensitive to a mutation p{b) on a state graph A if ^ </> 

unvacuously. A mutation p{b) is said to be not covered by P if ^ cj) 

for all <j) £ P. 



4.1 Coverage Estimation Methodology 

Our target mutation coverage estimation problem can be formulated as follows. 
Given a system modeled by a state graph A = {M,m^,C,D,L,x,T,X,T,p)j ^ 
set of properties specified as TCTL formulae P = {(j)}, and a mutation model 
/i, suppose A, mP \= £ P, estimate the completeness of P with respect to 

A and p, where completeness is the fraction of the mutations detected by some 
property in P. 

As shown in Figure 1, a solution to the above stated problem is proposed 
as a mutation coverage estimation procedure that can be integrated with model 
checking. Given a system state graph A = {M,m^ ,C, D, L,x,T, p) and a 

set P = {(/)} of TGTL property specifications, the mutation coverage estimation 
procedure starts only after it is verified that A, \= (j) for all (/> G P. In 
other words, if a system state graph violates some given properties, a designer 
should revise the model or the properties before estimating the coverage. When 
a state graph satisfies all given properties, a mutation model p is applied to 
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Fig. 1. Coverage Estimation Methodology 



some basic element b € M U T of the graph to obtain a mutated state graph 
Model checking is re-performed on the mutated graph and it is checked if 
the mutation is covered (Definition 5). The application of mutation model and 
the model checking are repeated for each mode or each transition depending on 
the mutation model. After coverage estimation, if the value of a coverage metric 
is too low, that is, smaller than some user-defined threshold value, then more 
properties are to be specified by analyzing the uncovered parts in a state graph. 

Definition 5. Covered Mutation 

For a given system state graph A = {M,m^ ,C, D, L,x,T, X,t, p) and a set of 
TCTL properties P = {(p}, suppose A, vnP ^ p for all p £ P. Further, for a given 
mutation model p, that is applied on some element b £ MUT of the state graph 
A, suppose the mutated state graph is = (M', m'°, C, D, L, , T' , A', r', p'). 
If ^ p, for some p £ P, then the mutation p{b) is said to be covered 

by P for A. 

In this work, we propose several mutation models and corresponding 
mutation-based coverage metrics to aid engineers in analyzing if a set of proper- 
ties has covered most functionalities of a system. The proposed coverage metrics 
are calculated automatically without any effort beyond model checking. Ideally, 
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the coverage should achieve 100% for each proposed metric, which implies the 
given set of specifications has covered every corner of a system design in each 
design aspect. However, when a state space becomes large, it is really hard to 
achieve 100% coverage for each metric, especially the transition-based ones. 

The value of a coverage metric cov(A, P, /i) with respect to a mutation model 
^ for a state graph A and a set of properties P is a ratio of the number of covered 
mutations to the total number of mutations applied, where covered mutations are 
as defined in Definition 5 and total number of mutations is the total number of 
basic elements reachable in a state graph, to which the corresponding mutation 
was applied. 



rmif A P it\ — #Covered Mutations _ 

^Total Mutations “ |{b}| 



( 1 ) 



In the above, the total number of mutations differ based on whether the 
mutation was applied to system modes or transitions. 

For a given state graph A = {M,m^,C,D,L,x,T,X,T,p) and a set {(/?} of 
TCTL formulae, the complexity of labeling in model checking A against (p is 
I I X I ^ I, where | (/? | is the number of sub-formulae in (p and | A \= \M\ + \T\ 
is the size of the state-graph. The complexity of our proposed mutation-based 
coverage estimation is 0{model checking) x \M\ for state-based mutations and 
0{model checking) x |T| for transition-based mutations. 

If any entry of TCTL properties has zero coverage for all metrics, this prop- 
erty is said to be vacuous. This feature is similar to vacuity detection [3]. 



4.2 Mutation Models and Coverage Metrics 

Based on our verification experiences, we propose six different structural muta- 
tion models and corresponding coverage metrics. Each mutation model charac- 
terizes a different aspect of a system state graph and identifies different charac- 
teristics of TCTL properties. In general, a more restricted property will cover 
more parts of a system model. For example, when the time interval in a TCTL 
formula gets shorter, the larger is the coverage. On the contrary, the satisfaction 
of an eventuality property EFcj) only requires a state satisfying (p along some 
path. So, it always covers few states, but several transitions. In the following, 
we will assess each mutation model and corresponding coverage metric when 
verifying different kinds of properties. Metrics can also be combined together to 
give overall estimations. 



Mutated Initial. This model is a mode-based mutation that changes the 
initial mode of a system state graph to be another one. Given a state graph A = 
(M, m°, C, D, L, X, T, A, r, p), the mutated initial model piniuai when applied to 
a non-initial mode m € M,m yf gives a mutated state graph = 

{M,m,C,D,L,x,T,X,T,p). 

An initial state specifies the initial values of all variables in the system. For 
a good system design, each initial value should be explicitly specified, that is. 
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there should be a unique initial state. If a property specifies initial values for 
all variables, that is, it is satisfied only in a particular initial state, then the 
corresponding coverage estimation metric will take a value of 100 %. Otherwise, 
it will be a percentage of the number of non-initial states that cannot play 
the role of an initial state. A formal definition of the coverage metric for this 
mutation model is given in Equation 2. 

C0V(A, P, fi.nit.al) = (2) 

Because this model changes only the initial mode to another one and because 
the labeling algorithm in model checking does not distinguish between initial 
and non-initial states, the labels in the modes need not be re-computed. Hence, 
the coverage estimation procedure for this mutation model is constant for each 
iteration. 



Delayed Transition. This model is a transition-based mutation that delays 
a mode transition by inserting a new mode between the source and destination 
modes of the transition. For each transition e = (ms,md) € T, a new mode m 
is inserted between nis and md such that the following conditions are met: (1) 
x(m) = (2) the transition e' = (ms,m) is non-deterministically timed 

that is T(e') = true, and (3) the transition e is now changed to originate from m, 
that is, e = {m, m-d)- Given a state graph A = {M, m°, C, D, L, x, T, A, r, p), the 
delayed transition model pdeiay when applied to a transition e = {ms,md) G T 
gives a mutated state graph = [M' , m°, C, D, L, x', T' , A', r', p'), where 

M' = M\j{mdeiay}, TUdeiay IS a newly introduced mode, x'i^) = G ^ 

and x'{tndeiay) = x{m), T' = T \J {{ms,mdeiay),{rndeiay,md)}\{e}, A'(e') = 
A(e'),Ve' G T\{e}, {{ms,mdeiay)) = e, X' {{mdeiay.rrid)) = A(e), t'(c') = 
r(e'),Ve' G T\{e}, t' {{ms, m del ay)) = true, t' {{m delay, md)) = r(e), p'{e') = 
p(e'),Ve' G T\{e}, p' {{ms,mdeiay)) = 0, and p' {{mdeiay,md)) = p{e). 

The coverage metric corresponding to the delayed transition model is defined 
in Equation (3). 



COv{A, P, Pdelay) 



I {e| eeT,30e P, I 

|T| 



(3) 



This metric provides a measure of whether the timing in a system design is 
correct to the specification. The metric was inspired by the delay fault models 
found in simulation-based coverage estimation. 



Stuttering Mode. This model is a mode-based mutation that adds a self-loop 
transition to a mode. Given a state graph A = {M, mP , C, D, L, x, T, A, r, p), the 
stuttering mode model Pstutter when applied to a mode m € M gives a mutated 
state graph j\Ustuttee{m) _ j,r ^ ^ where T' = T U {e}, 

e = {m,m), A'(e) = e, r'(e) = true, p'{e) = 0, and all other transitions have 
unaltered synchronization labels, triggers, and assignments. 

Applying the stuttering mode mutation model to a mode creates a single- 
node strongly connected component (SGG) if the mode itself was not in an SGG 
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before mutation. SCCs are required for infinite computation runs such as in 
checking fairness constraints and in checking CTL properties such as EGi^ and 
AF(/). In hardware systems, such a mutation model has an effect of allowing 
a component to remain in a particular state forever, while still synchronizing 
with a clock along the looping transition. The coverage metric corresponding to 
this mutation model is thus an estimation of the number of modes that could 
be detected by the properties if they were to stutter. Though this metric is a 
mode-based one, it actually estimates the progress of transitions or computation 
runs. 



COu(A, l-l’ stutter') 



[Ml 



( 4 ) 



Skipped Mode. This model is a mode-based mutation that makes a non- 
initial mode unreachable by redirecting all of its incoming transitions to all of its 
child modes. Given a state graph A = (M, m°, C, D, L, x, T, A, r, p), the skipped 
mode model Pskip when applied to a mode m G M, m ^ vA gives a mutated 
state graph W = (M', m°, C, D, L, y', T' , A', t', p'), where M' = M\{m}, 

y'(m') = xi'm.')ym' G M' , T' = T\{e \ e = {ms,m)} U {e | e = {ms,md), 
md are the predecessor and successor modes of m, for each new transition e' that 
corresponding to a deleted transition e, A'(e') = A(e), r'(e) = r(e), p'{e) = p(e), 
and all other transitions have unaltered synchronization labels, triggers, and 
assignments. 

This mutation model has a reversed effect compared to that of the Delayed 
Transition model because skipping a mode implies a reduction of the compu- 
tation run into a shorter one, which in turn, implies a shorter time is required 
to reach the modes beyond the skipped one. The temporal sequence of modes 
differs after mutation. This mutation model is not applied on the initial state 
because it has no incoming transition. 

Since the skipped mode mutation model tests the presence of each individual 
mode, the corresponding coverage metric is an estimation on the number of 
modes whose existence can be detected by a set of properties. 



cov{A, P, Pskip) 



\{rn\rn^M,rn^rnP 

\ M\^1 



( 5 ) 



Removed Transition. This model is a transition-based mutation that deletes 
a transition from a state graph. This mutation checks if the existence of a transi- 
tion is detectable by a given set of property specifications. If a property is sensi- 
tive to the existence of a particular transition, then it will not be satisfied in the 
mutated state graph. Given a state graph A = (M, mP , C, D, L, y, T, A, r, p), the 
removed transition model Premoved when applied to a transition e = {ms,md) G 
T gives a mutated state graph A''>-emo«ed(e) _ (^M' , , C, D , L, x' ,T' , X' , t' , p') , 

where M' = M\{m \ m G M, m is unreachable after removing e}, y^(m') = 
y(m),Vm' = to G M'flM, T' = T\{e' | e' G T, e' is unreachable after removing 
e}, and all remaining transitions in T' have the same synchronization labels, trig- 
gers, and assignments as their original counterpart in T. 
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The coverage metric corresponding to the removed transition mutation model 
gives an estimate on the number of transitions whose existence can be detected 
by a given set of properties. 



COv(^y ^removed) 



\T\ 



(6) 



Mutated Invariant. This model is a mode-based mutation and is a simpli- 
fication of that proposed by Hoskote et al. in [9]. In this mutation, instead of 
toggling the value of an observed signal in a state, its invariant is changed. This 
is more of a semantic mutation than a structural one. 

In our implementation, we have skipped the observability transformation 
and the dual operation on observed propositions as defined in [9]. Further, 
we mutate each mode twice: once with the false invariant and once with the 
true invariant. Model checking is also done twice for each mutated mode. 
Then, we take the union of the covered modes as the coverage for this 
mutation model. If a mode with a false/ true invariant does not affect the 
satisfaction of any given property, then the mode is not covered. Given a 
state graph A = ,C, D, L, XjTA,t, p), the mutated invariant model 

Piny when applied to a mode m € M gives two mutated state graphs 
^Uin.v(m,true) _ (^M , , C , D , L, , T, X, T, p) , where x'(m') = x(m'),Vm' yf m 

and x'i'nT') = false, and j\Vi<tv{mjaise) _ ,C, D, L,x' ,T, X,t, p), where 

x'i'm') = yf m and x'{‘oi) = true. 

As defined in Equation (7), the coverage metric corresponding to the mutated 
invariant mutation model gives an estimation on the number of modes that can 
be detected once its invariant is set to false. 



C0V{A, P, Pinv) 



|M| 

( 7 ) 



4.3 The Overall Coverages 

After computing coverage respect to each metric, we can also calculate the over- 
all coverages. The overall coverages show how many parts of a design were ever 
covered by any metric and integrates all related metrics to show the holes that 
were never covered in any metric computation for a given set of property speci- 
fications. 

Based on the basic element we apply mutation to, that is, the type of muta- 
tion model, we classify the metrics into two categories. 

— Mode Coverage Metrics: The metrics corresponding to mode-based muta- 
tion models including Mutated Initial, Stuttering Mode, Skipped Mode, and 
Mutated Invariant are called mode coverage metrics. 

— Transition Coverage Metrics: The metrics corresponding to transition-based 
mutation models including Delayed Transition and Removed Transition are 
called transition coverage metrics. 
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Definition 6. Overall Mode Coverage 

Given a state graph A = (M, C, D, L, T, A, r, p) and a set of TCTL prop- 
erties P = {(j)}, the overall mode coverage is defined as ratio of modes covered by 
any one of the four mode coverage metrics for P. Let the sets of covered states for 
each of the mode coverage metrics he, respectively, CSmiUai, CSstutter, CSskip, 
and CSinv These sets are defined in the numerators of the fractions in Equa- 
tions 2, 5, and 1, respectively. The overall mode coverage is then defined as 

in Equation (8). 

C0V{A, P, Pmode) = U 

Definition 7. Overall Transition Coverage 

Given a state graph A = {M, C, D, L, T, A, r, p) and a set of TGTL prop- 
erties P = {(j)}, the overall transition coverage is defined as ratio of transitions 
covered by any one of the two transition coverage metrics for P. Let the sets of 
covered transitions for each of the transition coverage metrics be, respectively, 
CTdeiay o.nd CTremoved- Thcsc scts are defined in the numerators of the frac- 
tions in Equations 3 and 6, respectively. The overall transition coverage is then 
defined as in Equation (9). 

C0V{A, P, ptrans) = 

5 Experimental Results 

We have implemented all the six proposed mutation models into the State Graph 
Manipulators (SGM) model checker [15], which is a high-level model checker for 
both real-time systems as well as systems-on-chip modeled by a set of timed 
automata. Several optimizations were implemented into the mutated model gen- 
eration and the coverage metric calculations. Due to page-limits, we skip this 
part of the discussions. 

We applied our proposed mutation models and estimated the coverage using 
the corresponding proposed metrics for several practical design models that we 
created in the course of this project work, including a simple timer, a bridge 
model for the ARM AMBA bus architecture, an APB slave model in AMBA, a 
traffic light controller, and a bakery scheduler. Table 1 shows the overall coverage 
estimation results for the above examples along with the performance results of 
the implemented coverage estimators. The model checker with coverage estima- 
tion programs were executed on a Linux Mandrake 8.1 workstation with a 1.0 
GHz Intel Pentium GPU and 512MB memory. 

The simple timer is a timer with three stages. After a user-defined time 
interval, the timer goes to the next stage. When the timer reaches a pre-defined 
maximum value, it issues a reset signal. At the initial attempt, we specified three 
properties to verify the timer model, which included checking the time progress 
(AG( timer=0 — >■ AF(timer > 0))), the initial behavior (timer = 1), and the 
effect of reset (AG( reset=0 — >■ AF( timer=0))). After applying our proposed 
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Table 1. Overall Coverage Estimation Results 



State Graph (M) 


Graph Size 

\M\/\T\ 




Time*/ 

Memory** 


COi^(v4., P , fJ^rnode ) (^)/ 
CO'y(v4, P^ /^frans ) (9o) 


Simple Timer 


18/18 


3 


0.30/0.40 


100.00/ 91.67 


4 


0.32/0.44 


100.00/100.00 


AMBA APB Bridge 


10/24 


8 


0.45/0.01 


100.00/ 91.67 


AMBA APB Slave 


26/168 


3 


1.46/0.08 


100.00/100.00 


Traffic Light Controller 


253/542 


5 


210.89/1.70 


90.00/ 79.17 


Bakery Scheduler 


1293/2073 


5 


2087.65/1.08 


68.36/ 10.51 



‘Time is in seconds, “Memory is in MB, 

Note: the fiinv model is not included in these overall estimations. 



coverage metric estimation, as shown in Table 1, the mode coverage was 100 
%, but the transition coverage was only 91.67 %. On analyzing the uncovered 
traces generated by the SGM coverage estimator, we found one of the transitions 
in the state graph model of the timer was not covered by the three properties. 
The uncovered transition deasserts the reset signal after it was asserted. Later, 
we wrote an additional property (AG( reset=l — >■ EF(timer=0))) to cover this 
deassertion behavior and it was possible to achieve 100% transition coverage. 

Table 2 shows the coverage metrics estimated for each mutation model. From 
the numeric coverages, we get an idea of the completeness of the given proper- 
ties. In Table 2, a threshold of 30% was assumed for high- lighting all coverages 
that are below the threshold percentage. We observe that the AMBA APB slave 
was the most uncovered application. Evidently, the three given properties are 
insufficient and more are needed to perform a thorough verification of the model. 
The other poorly covered application is the bakery scheduler. Though five prop- 
erties were specified for this system, however the model itself was the largest in 
size among all the examples, hence many more properties are required to have a 
higher coverage. From the above two poorly covered examples, we can conclude 
that poor coverages are a result of an unproportionate number of properties 



Table 2. Coverage Metric Estimations for each Mutation Model 



State Graph (M) 




cov{A, P, /i) (%) 


Idinitial 


l-^delay 


fdstutter 


fdskip 


Id-remove 


fdinv 


Simple Timer 


4 


100.00 


94.44 


5.88 


100.00 


94.44 


100.00 


AMBA APB Bridge 


8 


44.44 


91.66 


90.00 


100.00 


20.83 


100.00 


AMBA APB Slave 


3 


0.00 


100.00 


0.00 


4.00 


4.77 


100.00 


Traffic Light Controller 


5 


53.57 


72.61 


11.68 


0.00 


13.69 


100.00 


Bakery Scheduler 


5 


52.02 


10.36 


22.19 


1.29 


0.49 


100.00 



Boldface represents below threshold coverages (threshold = 30%). 
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compared to the size of the system model. In other words, larger the system, 
the more properties are required for a more complete verification. The two best 
covered application examples are the simple timer and the AMBA APB Bridge, 
which also show that it is relatively easier to achieve high coverages for small 
and simple systems. 

As far as mutation models are concerned, we can observe from Table 2 that 
the ^inv model achieved a 100 % coverage for all the examples, whereas the 
^^remove model achieved an above threshold coverage only for the simple timer. 
These observations illustrate the different natures of the models and the relative 
ease or difficulty with which we can achieve higher coverages for different muta- 
tion models. Hence, it is deduced that not necessarily do we have to increase all 
coverages. It depends on the characteristics of the application example itself as 
described in the following. 

Timing delay is an important factor in both the simple timer and the traffic 
light controller examples, hence its coverage as modeled by ^deiay was required to 
be as high as possible. Currently, the obtained coverages of 94.44% and 72.61%, 
respectively, are still quite low. Starvation is an undesired feature in the bakery 
scheduler example, hence its coverage as modeled by ^stutter was required to be 
as high as possible. The five given properties achieved only 22.19% stuttering 
mode coverage, which shows that more properties are required. 



6 Conclusions and Future Work 

We have proposed a coverage estimation methodology to give formal verifica- 
tion a quantitative statistics on how exhaustive it is. Based on the estimation 
and analysis of uncovered traces, the verification engineer can decide whether a 
further verification iteration is required or not. Besides, the log file shows what 
parts of a system model are not covered and thus need more properties to exer- 
cise, or a user may also choose to refine the system model. Instead of focusing on 
the static analysis of states as in several previous work, we proposed six different 
mutation models and their corresponding coverage metrics to capture behaviors 
of a model, which is also the most error-prone. The proposed estimation needs 
no extra effort beyond conventional model checking. Further, the complexity of 
estimation is acceptable in practice, as we have shown in Section 5. Future work 
consist of proposing some semantic mutation models and coverage metrics and 
making the structural mutation models more exhaustive and complementary. 
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Abstract. Model checking is a powerful automated formal technique 
that is used for verifying reactive system’s properties. In practice, model 
checkers are limited, due to the state explosion problem (the number of 
states to explore grows exponentially with the number of the system’s 
processes). Modular verification based on assume-guarantee paradigm 
mitigates this problem by using a ’’divide and conquer” technique: the 
system’s components are checked with a set of user-supply assumptions of 
the environment (environment model) , and then, these assumptions must 
be verified on the environment (guarantee or assumption discharge). Un- 
fortunately, this approach is not automated because the user must specify 
the environment model (assumptions). In this work, a novel technique 
is shown to, automatically, generate assumptions for all the system’s 
components. The proposed algorithm simultaneously computes the en- 
vironments of all components in the system, such as the generated as- 
sumptions for a component, which can be used in order to determine the 
assumptions of another component with the one that communicates it. 
The assumptions are computed as association rules between the compo- 
nent’s interfaces. We applied our approach to the modular verification of 
a steam boiler control program 



1 Introduction 

Model checking procedures are powerful and useful automated tools for the ver- 
ification of a state finite system [9] . Given a property and a model of the system, 
the model checker performs an exhaustive state space exploration of the sys- 
tem and returns true if the property holds in the system, or a counterexample 
(scenario illustrating the property’s violation) if it doesn’t. However, the main 
drawback is the state space explosion problem. In general, the state space size, 
which the model checker must cope, grows exponentially with the number of 
processes that form it. This is the case of reactive systems, because they keep 
a continuous interaction with the environment, and they are composed of many 
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parallel and concurrent processes. Systems with large state spaces can be veri- 
fied by introducing several techniques, such as partial order reductions [21] and 
symbolic representation of states and transitions [19], in the model checking pro- 
cedures. However, these techniques still have limits, and many realistic systems 
are not tractable. 

The use of modular or compositional techniques is an approach to avoid the state 
explosion problem [25]. The system is broken up into components, and each one 
is analysed separately. Then, the result for the entire system is deduced from 
the obtained analysis in the individual components. When model checking is 
applied for an isolated component, we need to introduce a model of the environ- 
ment interaction with the component. If this environment is not supplied, we 
consider the '' universal environment'' or the "most general environment": the 
component can receive and send any event and in any order in the environment. 
Obviously, this approach is not real because components are designed to oper- 
ate in a concrete environment. This problem is also commonly referenced as the 
environment problem [20]. 

Assume- guarantee reasoning [22] [1] addressed this problem. The environment 
model is specified by means of assumptions provided by the engineer, and the 
component is verified together these assumptions. Then, the environment must 
guarantee these assumptions, and verify them through the remaining compo- 
nents. In general, the assumptions are incorporated to the component in a man- 
ual fashion, based on the user’s knowledge and the feedback obtained from the 
counterexamples . 

This approach has various weaknesses. First, although verification with the 
model checker is fully automatic, the modular verification, used by this pro- 
cess, is not. The engineer must provide the assumptions, which is a hard task, 
and, moreover, these assumptions must be discharged in the environment. Sec- 
ond, the process is error-prone, because manual assumptions can not restrict the 
environment sufficiently, and, thus, produce false positives. Finally, the approach 
is time-consuming in human effort and computational resources. 

In this paper, we address the problem of the automatic assumptions generation 
in the modular verification of reactive software specifications. In a previous work 
[24], we presented an approach for automatically generating assumptions for a 
single component of the system, based on the environment’s exploration. This 
algorithm is extended here, in order to compute the assumptions in a simulta- 
neous manner for all the system’s components. Thus, the obtained assumptions 
in a component will be influenced by the computed assumptions in another or 
other component/s, which, as a whole, will produce more strong and real as- 
sumptions. 

In the next section, we will proceed by briefly reviewing the underlying com- 
putational model and the modular verification, based on the assume-guarantee 
reasoning. Section 3 summarizes the method for single component assumption 
generation, and Section 4 shows the algorithm to generate assumptions for var- 
ious components, simultaneously. This technique is applied to a case study and 
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compared with the previous in Section 5. Section 6 concludes with a description 
of related and further works. 

2 Background 

Before any verification effort, we must define the syntactic and semantic model 
of the software specifications. We use state transition diagrams (STDs) to model 
the system’s individual processes. The complete system is specified as a set of 
STDs, synchronized by events, named Synchronous Reactive System (SRS), and 
components will be modelled as slightly modified SRSs, in order to capture the 
interface behavior. This computational model has been defined in [23]. Next, we 
will briefly summarize the notation. 

Syntactic and Semantic Issues: Fig. 1 shows a system formed by two 
processes, which are modelled by the STDs PI and P2. Arrows without sources 
indicate the STD’s initial state. Other arrows show local transitions in the STDs, 
and they are labelled in a cond/acc way, where cond is an input event (or it pred- 
icates over input events) and acc is a set of output events. A transition in an STD 
is enabled when the STD is in the local source state and the cond evaluation is 
true. An external input event can be generated by the environment (events INIT 
and FINISH). An external output event is the one that the system produces for 
the environment (events OPEN and CLOSE). The remaining events are internal 
events and synchronize the STDs (events Start and Stop). We refer to the local 
state of an STD as a control state. 



PI 




P2 



Start / 
OPEN 




Stop / 
CLOSE 



Fig. 1. System with two STDs 



The execution semantics that we adopted is similar to that described in 
STATEMATE [13] or RSML [18]. The system’s execution is divided into two 
time models, macrostep and microstep. In each microstep, a subset of enable 
transitions is taken and executed at the same time, changing to new local states 
in STDs, and generating external output and internal events, which, in turn, can 
initiate another microstep. In the Fig. 1 example, the STD PI is initially in OFF 
position, and the STD P2 in CLOSED position. When INIT event occurs, the 
transition from OFF to ON is enabled and executed, generating the event Start, 
which will, in turn, enable the transition from CLOSED to OPENED in STD P2 
in the next microstep, producing the external output event OPEN. The system 
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is in a stable configuration when no more microsteps can be executed. The time 
interval between two stable configurations is called a macrostep. Additionally, we 
must also consider the synchrony hypothesis [7]: during a macrostep, no external 
input events can come from the environment. 

Synchronous Reactive Systems: A Synchronous Reactive System (SRS) 
<P is a, 5-tuple < A, G, I, O, — where A is the set of STDs, G is the set of 
internal events that communicate the STDs, I is the set of input events received 
from the environment, O is the set of output events provided to the environ- 
ment and — is the transition relation between the system’s states. The 
(global) state is formed by the control states of each STD in ^ . We denote the 
configuration of a SRS by C = (S,GG,GI,GO), where GG, GI and CO are 
the values of the internal, input and output events at any moment (macrostep 
or microstep) and S is the state. The set of all possible configurations of <P is 
denoted by Global{<P). The transition relation — yf^C Global{<P)xGlobal{<!>) de- 
scribes the execution of the SRS at the microstep level. Intuitively, (C, C') means 
the execution of a microstep in the SRS, either possibly changing the state of 
<P or generating output and/or internal events in C' , which could activate other 
microsteps, or C is a stable configuration. 

Synchronous Reactive Components: When we break up the system into 
different components, internal events, which communicate them, must be treated 
differently than the others, because they can take any value in each microstep. 
Note that this behavior is different from the input events, which take values 
at the beginning of each macrostep. For this reason a component’s behavior 
can not be directly mapped to a SRS’s behavior. Thus, a Synchronous Reactive 
Component (or component in short) is a tuple < A, G, I, O, II, OI, — y^> where 
A, G, I , O and — y^ have the same significance than in the SRS, and II is the 
set of input interface events, received from other components, and OI are the 
set of output interface events. 

Assume-Guarantee Reasoning: Various formulations of assume- 

guarantee reasoning exist. We will use the Abadi and Lamport composition 
theorem [1]. This theorem is based on the following induction rule. Let S be 
the whole model, which can be decomposed into the parallel composition of 
[(7111(72], and ip the desired property to be checked, which can be decomposed 
into ip = ip\ hipi- In order to prove ip over S, it is enough to prove: 

— Each ipi is true for the components (7j under the assumption Vi 

— Each component Ci grants the assumption Vj i yf j of each component over 
its environment 

In our framework, these assumptions, which are specified for a component’s 
environment, are related to its input interface. 

Microstep Visibility: When we verify a component in isolation, input in- 
terface events can take any value in each microstep. For this reason, and, unlike 
other approaches [17], where, previous to the verification task, the microsteps 
are eliminated, we make the microsteps visible. We include a new variable pS 
in each configuration of the component, which performs the function of a mi- 
crostep counter (a similar idea is presented in [8] to prune backward searches 
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in BBD-based model checking). When a new microstep is executed, the vari- 
able fj,S is increased by 1 and initialized to 0, when a new macrostep starts. In 
general, this approach is less effective, because it increases the size of the state 
space, but the explicit representation of all microsteps allows us to state the ad- 
equate assumptions, and prove them, without any distortion of the entire model 
semantics. 

3 Assumption Generation for a Single Component 

In this section, we will briefly describe the method to automatically derive as- 
sumptions for a single component. The algorithm was previously discussed in 
[24]. Fig. 2 illustrates the general method and the phases to construct the as- 
sumptions in the verification of a property in the component P, in a global 
system composed by the component P and the component Q (its environment) . 
We will denote this approach as IndGen. 



Q Component P Property 




Fig. 2. Assumption Generation for a Single Component (IndGen) 



In the Composition stage, all configurations of the component’s environ- 
ment are generated at a microstep level. Let E = OIp (11 Iq be the set of input 
interface events, which communicates the environment and the component to be 
verified. The composition phase computes the set Global' (P) = Global(P) f E, 
where the operator f erases from Global(P) those references to output interface 
events of the environment, which are not part of the input interface of the com- 
ponent P. We use a reachability algorithm with depth-first search: starting from 
the initial configuration of the component P, and, with each possible value in 
the input events, generating new configurations, using the microstep operator 
— The process continues with a new reachable state, and finishes when all 
configurations are stable (no more microsteps can be executed) from reachable 
states. 

The Extraction step builds the assumptions from the set of configurations com- 
puted in the previous step. Assumptions are obtained by adapting the Apriori 
algorithm [3] used in the discovering of association rules between attributes in 
databases, where the attributes are the events in if U {^S'}, and the database is 
the set of configurations Global' (Q). Thus, an assumption for the component P 
is an implication X ^ Y, where X C E U {^S”} and Y C E. For example, in 
the Fig. 1 component, if we assume that INIT and FINISH are input interface 
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events, the assumption /iS' = 1 IN IT = Q, FINISH = 0 means that the 

component does not receive the INIT and FINISH events in the first microstep. 
As the set of configurations computed in the composition phase contains all pos- 
sible execution sequences of the environment, the obtained assumptions in the 
extraction phase will reflect some behavior patterns between the interface events 
and the moment in which they are produced. 

Finally, the obtained assumptions are gathered together with the component and 
the property to be verified into the model checker. More details can be found in 
[24]. 

4 Simultaneous Assumption Generation 

In the previous method, and according to the semantics proposed in Section 2, 
assumptions are produced for each component, assuming that the behaviour of 
interface events is free. Specifically, in order to generate the set of all environ- 
ment configurations during the composition stage the input interface events can 
take any value for each microstep. For example, let us assume that two compo- 
nents (P and Q) are linked by means of the events a and b {P sends a to <5 
and Q sends b to P). In order to compute the assumptions for component Q, 
the interface event a takes, for each microstep, values 0 and 1. The same can be 
applied to the component P, in relation to the event b. 

This approach may be pessimistic, due to the fact that the interface event’s be- 
havior is not free. For instance, it is possible that the behavior of the event a 
is determined by the event b: in a concrete microstep, event a is never sent to 
component Q because, in a previous microstep, event b does not send to com- 
ponent Q. In this case, we do not need to consider the value 1 of the event a in 
the composition phase of the component Q. This remark allows to obtain more 
restricted assumptions, because they are produced by considering a more real 
environment’s behavior. 

In this Section, we address this problem. Assumptions are not computed in a 
component in isolation, but they are produced by considering several compo- 
nents. In other words, assumptions produced for a component will be influenced 
by those obtained in another or other related component/s. 



4.1 The Method 

Fig. 3 shows the verification process diagram proposed for the simultaneous 
assumption generation in various components (SimGen). 

Like in the IndGend algorithm, the process consists of two phases: Compo- 
sition and Extraction. In this approach, both are performed in an interactive 
and incremental fashion, for each microstep and in each component. According 
to Fig. 3 scheme, the component P computes the configurations and generates 
the assumptions for the first microstep. These will be used in component Q, 
,in order to generate its configurations and assumptions, corresponding to the 
second microstep, which will be used, in turn, by component P, to determine 
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Fig. 3. Assumption Generation for Various Components {SimGen) 



the assumptions in the third microstep, and so on, until all components can not 
execute more microsteps. Thus, suppositions in a determined microstep gather 
’’the real” behavior of their environment, instead of deciding which one is free, 
as in the case of IndGen. At the end of the process, assumptions for a component 
will be the union of those obtained in each of their computed microsteps. 

We formalize this approach for a global system S, composed of two components 
Pi and P 2 - Basically, the process consists of pairs of Configuration- Assumption 
actions (corresponding to Fig. 3 phases: Composition and Extraction). 
Configurations (P,/rS',A). It computes the configurations in the component 
P, corresponding to the microstep ^S, using the set of assumptions A. If no 
assumptions exist, A is not specified. Since all configurations must be gener- 
ated in the microstep fiS, these will be obtained by exploring all the possible 
macrosteps. In order to do this, we use a width-first search, at a macrostep level, 
unlike IndGen, where the exploration is made using a depth-first search, gener- 
ating all the possible chains of microsteps for a given macrostep. 

Assumptions (C). It generates from a set of configurations G, the set of as- 
sumptions corresponding to the events that appear in it. Like in IndGen, as- 
sumptions are computed as association rules over the set G, using a variation of 
the algorithm Apriori [24]. 

Using these actions, the algorithm to simultaneously generates assumptions in 
various components will be: 

microStep =1; /* first microstep */ 

For each component Pi in S 

C(Pi)= Configurations (Pi , microStep) ; 

A(Pi)= Assumptions (C(Pi)); 
a=2; b=l; 

While ! ( Stable (S) ) 

microStep = microStep +1; /* other microsteps */ 

For each component Pj (j=a,b) 
k = (j mod 2) + 1; 

C(Pj)= Configurations (Pj , S, A(Pk)); 

A(Pj)= Assumptions (C(Pj)); 

a = (a mod 2) + 1; /* component interchange */ 

b = (b mod 2) + 1; 
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The process starts when the space of states, corresponding to the first mi- 
crostep in each component, C\{Pi) and Ci{P 2 ) is computed. Since then, in the 
first microstep, each component reacts solely to the events coming from the en- 
vironment (input events) and not to the interface events, the configuration’s 
computation is performed without assumptions. For a microstep i, given the 
component Pi, this generates the configurations corresponding to the microstep 
i, C'i(Pi), using the assumptions computed by the component P 2 in the mi- 
crostep i-1, Ai-i{P 2 ). The process is repeated for each microstep, interchanging 
the components, until no more microsteps can be executed in both components 
(the system is stable). At the end of the process, the sets of assumptions of both 
components are obtained as the union of the assumptions obtained for each mi- 
crostep. Although the method is specified for two components, the generalization 
for n components is straightforward. 

The algorithm generates and stores several state spaces (as many as compo- 
nents in the system), but this does not imply a state space explosion. In fact, 
the global state space is not computed, but several ones, corresponding to each 
of the components, are computed independently. More specifically, we only need 
to store the space of states, corresponding to the microstep in course, in each 
component, and not those referring to the microsteps, previously computed in 
the component. 



4.2 Example 

In order to illustrate the SimGen algorithm, we applied it to the simple system, 
showed in Fig. 4, which is formed by the components P\ and P 2 , both with one 
STD, which communicates them by means of the interface events a and b. Table 
1 shows the obtained assumptions for each microstep and component. 



P, 





Fig. 4. System with Two Components 
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Table 1. Assumptions for the components of Fig. 4 



p.S 


A^s(Pi) 


Ahs{P2) 


1 


0 

II 




2 




b=0 


CO 


0 

II 


b=0 


4 


0 

II 


0 

II 



This process is started by computing the configurations and assumptions 
which correspond to the first microstep in both components. In the first mi- 
crostep, the assumption = 1 a = 0 is obtained in the component Pi, 
which reflects that, in the first microstep (for all macrosteps), the interface event 
a is never sent to the component P 2 - In the component P 2 , no assumptions can 
be generated in the first microstep, because the interface event b will take val- 
ues 0 or 1, depending on the value of the external event B. According to the 
SimGen algorithm, the process continues in the component P 2 when computing 
the configurations, which correspond to the second microstep, using the assump- 
tions computed in the first microstep (/rS' = 1 a = 0). Thus, in the second 
microstep, the component P 2 never executes the transition from the state ”1” 
to ”2”, because the event a was not received in the previous microstep, and 
the assumption = 2 a = 0 is generated. This process is repeated for 
each component, until both components are stable (in the example, the fourth 
microstep) . 

5 Application 

In order to evaluate the approach and the algorithm described in this paper, 
we applied them to a non trivial case study, which is frequently used in the 
formal methods literature, in general, and in formal verification, in particular: 
the steam boiler problem [2]. The functional specification contains 52 STDs, 331 
transactions and 467 events, and the declarative specification (requirements) is 
formed by 60 safety properties. In order to verify the system, we use the SPIN 
model checker [15] version 3.4.8 in a Compaq Proliant with 933 MHz and 1 
Gbyte of memory. The verification of each property in the functional model was 
not possible due to the state explosion. Thus, we will approach the problem from 
a modular perspective. Due to lack of space, we will only consider a part of the 
system, formed by the components in Fig. 5 

The component CONTROL maintains the control program’s operation mode 
and sends the open/close orders {Start/ Stop Pumps events) to the pumps, ac- 
cording to the water level {LevelRange, LevelRisk) detected in the component 
WATER. CONTROL is composed by 6 STDs, 36 events and 19 local proper- 
ties, and WATER by 5 STDs, 30 events and 10 local properties. We will check 
each component, assuming a universal environment (without assumptions), 
with ’’manual” assumptions, provided by the engineer, and with automatically 
generated assumptions, using the algorithms IndGen and SimGen. Because the 
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Start / Stop Pumps 




Fig. 5. Components CONTROL and WATER in the Steam Boiler 



verification process is the same in both components, we will focus on the CON- 
TROL component. 

First, we performed the verification under a universal environment (the interface 
events LevelRange and LevelRisk can take any value in any microstep) and four 
properties turned to be false. We analysed the counterexamples, returned by 
the model checker, and we concluded that the possible cause of the errors was 
that the events LevelRange and LevelRisk can be simultaneously received in the 
component. Obviously, this environment behavior is not realistic, and the false 
properties are really false positives. 

We tried to eliminate them by stating the manual assumption that the Level- 
Range and LevelRisk events can not be received at the same moment. In order 
to check that the assumption is correct, we verified it in the component WATER 
and no counterexample was returned. However, we again verified the compo- 
nent CONTROL with the assumption and two properties remained false. The 
analysis of the counterexamples returned is not easy, and, after several attempts 
and simulations, we were not capable to fix the errors. In order to solve this 
situation, we automatically generated the assumptions in the component CON- 
TROL, using the algorithms LndGen and SimGen. With the LndGen algorithm, 
four assumptions were produced. The total CPU time, used in the environment 
computation, was 808 seconds. SimGen was applied to CONTROL and WATER 
components, with six assumptions for the component CONTROL and three for 
the component WATER, and a total CPU time, to compute both environments, 
of 1209 seconds. 

In Table 2, the obtained assumptions are specified and compared with the manual 
specification. Assumptions 1 and 2, obtained with the automatic methods, are 
the same as those manually specified. The remaining assumptions are related to 
a specific microstep, and obtained with the automatic approach. Assumptions 
3 and 4 state that the interface events LevelRange and LevelRisk are not re- 
ceived until microsteps 4 and 3 happen, respectively. Although in the verified 
properties these assumptions have not influenced on the detection of false pos- 
itives, they are very useful to reduce the computational resources used by the 
model checker because they restrict the values that the interface events can take 
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for each microstep. Finally, if we use SimGen, two new assumptions are pro- 
duced. They indicate that LevelRange and LevelRisk are not received beyond 
the fifth microstep. Note that these assumptions will not be obtained with In- 
dGend algorithm, since they are applied to only one component (WATER) and 
the maximum length of microsteps reached is 4. 



Table 2. Assumptions for the CONTROL component 





Manual 


IndGen 


SimGen 


1 

2 


LevelRisk ^\LevelRange 
LevelRange ^\LevelRisk 


LevelRisk ^\LevelRange 
LevelRange ^\LevelRisk 


LevelRisk ^\LevelRange 
LevelRange ^\LevelRisk 


3 

4 




gS < 3 ^\LevelRange 
gS < 2 ^[LevelRisk 


gS < 3 ^\LevelRange 
gS < 2 ^\LevelRisk 


5 

6 






gS > 5 ^\LevelRange 
gS > 5 ^[LevelRisk 



Table 3 shows the results of the SPIN performance in the verification of 
the local properties in the CONTROL component, without assumptions {Uni- 
versal), with manual assumptions {Manual) and with automatically generated 
assumptions {IndGen and SimGen). In general, we can observe that the CPU 
time and the memory use of model checker decreases with the incorporation of 
new assumptions. In fact, this is because the environment is more restricted and 
the model checker will manage a more compact state space. The introduction of 
assumptions in the interface events will limit all possible values, their order and, 
even, the microstep in which they can occur. In particular, the automatically 
generated assumptions with the SimGen substantially improve the performance 
of the model checker up to a factor of 26.6, with respect to universal environ- 
ment, and 7.7, as regards manually specified assumptions. On the other hand, 
automatically generated environments avoid the assumption’s discharge (needed 
in the manual case), since they are guaranteed by the self generation method. Fi- 
nally, only with the automatic approach, it is possible to verify all the properties 
without producing false positives. 

The main drawback of the automatic approaches shown here, in general, 
and the SimGen algorithm, in particular, is the amount of resources needed to 
compute the assumptions (1209 seconds). However, as SimGen is independent to 
the property to be verified, we only need to execute them once, for all properties. 

6 Related Work and Conclusions 

The modular or compositional approach is a natural and effective technique 
to solve the problem of the verification of large complex systems. During the 
last decades, several authors have proposed good theoretical-funded and sound 
frameworks for the modular reasoning of reactive systems. We refer to the book 
by de Roever et al. [25] where an excellent introduction to the state of the art 
in the compositional verification is provided. 
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Table 3. Performance (Seconds and Mbytes) of SPIN in the component CONTROL 



Universal Manual IndGen SimGen 



Prop 


t 


mem 


t 


mem 


t 


mem 


t 


mem 


1 


135 


3.6 


30 


2.3 


9 


1.9 


3 


1.5 


2 


135 


3.6 


30 


2.3 


9 


1.9 


3 


1.5 


3 


\false positive 


\false positive 


13 


2.2 


5 


1.8 


4 


135 


3.6 


30 


2.3 


9 


1.9 


3 


1.5 


5 


135 


3.6 


30 


2.3 


9 


1.9 


3 


1.5 


6 


196 


3.9 


41 


2.5 


12 


2.0 


4 


1.7 


7 


154 


3.6 


34 


2.3 


10 


1.9 


3 


1.5 


8 


135 


3.6 


30 


2.3 


9 


1.9 


3 


1.5 


9 


135 


3.6 


30 


2.3 


9 


1.9 


3 


1.5 


10 


154 


3.6 


34 


2.3 


10 


1.9 


3 


1.5 


11 


154 


3.6 


34 


2.3 


10 


1.9 


3 


1.5 


12 


139 


3.6 


30 


2.3 


9 


1.9 


4 


1.5 


13 


178 


3.8 


40 


2.5 


12 


2.1 


5 


1.7 


14 


150 


3.6 


35 


2.3 


9 


1.9 


4 


1.5 


15 


139 


3.6 


30 


2.3 


9 


1.9 


4 


1.5 


16 


139 


3.6 


30 


2.3 


9 


1.9 


4 


1.5 


17 


false positive 


51 


3.0 


16 


2.4 


4 


1.5 


18 


false positive 


82 


3.8 


27 


2.9 


6 


1.5 


19 


false positive 


false positive 


34 


3.5 


6 


1.5 


TOTAL 


1943 


561 


229 


73 1 



However, the practical use and application of these techniques do not evolve 
at the same speed as the underlying theories. As Shankar states [26], inference 
systems for compositional verification have been more studied than applied. 
Recent developments in component-based engineering have provided new ad- 
vances in the automated formal techniques for verification. Alfaro and Henzinger 
works [4] [5] are examples that follow this direction. Some of these theories have 
been implemented in the MOCHA toolkit [6]. Calvin’s tool [11] provides auto- 
mated support for modular reasoning over multithreaded software. 

In the context of assume-guarantee reasoning, a support is needed to specify the 
most adequate environment assumption [14]. For this reason, it is very useful to 
find techniques and methods to automatically generate the component’s environ- 
ment. However, and, although the problem is not new, researches on automated 
assumption generation are scarce and very recent. In Inverardi et al.’s work 
[16] the component’s assumptions are generated, but the method is restricted 
to modular checking of deadlock freedom properties. The approach described in 
[12] is closer to our work, where an algorithm is shown to automatically generate 
the weakest assumption to restrict the environment, only where necessary for a 
component to satisfy a given property. More specifically, the proposed technique 
returns to one of following results: the component satisfies the property for all 
environments, the component violates the property for any environment or an 
assumption is returned, which characterizes, exactly, those environments where 
the component satisfies the property. 
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In [10], a framework is presented to perform assume-guarantee in an incremental 
and fully automatic fashion. Assumptions for the property, which the environ- 
ment needs to satisfy, are generated to hold and then discharge on the rest of 
the systems. These assumptions are computed by using a learning algorithm. 
Initially, the obtained assumptions are approximated but, gradually, they are 
more precise, by means of the analysis information in the counterexamples ob- 
tained. The main difference, with respect to previous approaches, is that if the 
computation runs out of memory (because the state space of the component is 
too large) the assumptions computed at this moment are valid. 

If we compare them with our approach, the above methods are dependent on 
property and the algorithms must be applied to each property to be verified. 
Moreover, the obtained assumptions must be discharged over the environment 
and, thus, the complete automation of assume-guarantee reasoning is not pos- 
sible. However, the approach is very interesting, when the environment is not 
given or it is unknown. 

Our approach generates the assumptions for various components, simultaneously 
and automatically, reducing the user’s guidance need in modular verification. 
The techniques presented have also special relevance to detect false positives, 
as the obtained assumptions are more restricted than the manual ones. As a 
consequence of this, the model checker performance is better. 

Because of the way in which assumptions are generated, if a property changes 
or new properties are added, we do not need to compute the assumptions again. 
This feature is important in the interactive verification practices, where new 
properties are submitted to be checked in an interactive fashion. 

The application of a non-trivial case study demonstrates the approach’s capac- 
ity to detect false positives and reduce the resource’s consumption of the model 
checker, in particular, and the complete verification process, in general. This ex- 
perience offers improvements for the approach, particularly those regarding the 
generated assumption’s expressiveness, and those directed to reduce the blow-up 
in the computation’s configurations. 
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Abstract. This paper presents a new group key distribution scheme 
based on the Rabin public-key cryptosystem, called Rabin tree, which is 
a binary tree where every parent node can be computed by the Rabin 
encryption of either of child nodes. The proposed scheme requires the 
same size of ciphertext with the LKH method [2] , a single individual key, 
which is the optimal size at receiver storage, and computation overhead 
of O(logn) time to extract the session key. The security of the proposed 
scheme against malicious receviers is studied. A probability that a given 
random root key would succeed to have a full Rabin tree with 2n — 1 
nodes is proven to be exponential to the number of users n. Finally, 
an application to broadcast encryption which allows excluding faulty 
receivers is proposed. 



1 Introduction 

Broadcast key distribution technologies are required from a number of applica- 
tions including copyright protection, secure contents delivery, and secure multi- 
cast over the Internet. The broadcast key distribution, or a broadcast encryption, 
aims to allow any legitimate user to decrypt a ciphertext using his private sub- 
scription key, and to deal with dynamic group where users can effectively join or 
leave the group at any time. Users should be able to decrypt the ciphertext while 
they are members of the group. Hence, a good scheme for broadcast key distri- 
bution with r users revoked (left) out of n users should minimize (1) bandwidth 
spent for broadcasting a ciphertext, (2) receiver storage required for securely 
storing individual keys, (3) key server storage for all individual keys, and (4) 
computational overhead for key establishment at both ends. 

The first attempt to broadcast key distribution was made by Wong et al. in 
[2]. Their approach is called Logical Key Hierarchy (LKH), where logn individual 
keys are assigned to every legitimate users in a group and a size of ciphertext 
is 0(r log(n/r)). In the LKH method, a ciphertext consists of sequence of root 
keys for each of the legitimate subsets excluding the revoked users. Since the 
LKH scheme was proposed, several improvements have been made. Perring et al. 
propose the ELK protocol that addresses reliability for key update messages [8] . 
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Naor, Naor and Lotspiech presented the Subset Difference (SD) method 
which reduces the bandwidth for ciphertext from 0(rlog(n/r)) to 2r — 1 using 
a pseudo-random number generator. In the SD method, the good subsets are 
represented by the difference of two subsets, i covers good users and j excludes 
the revoked users in i. The SD method requires a greater storage for individual 
keys at receivers than the LKH method. In order to improve the receiver storage 
size in the SD method, Halevy et al. proposed the Layered SD method (LSD) in 
[5]. In the LSD method, each receiver stores 0(log^~''^n) individual keys where 
£ is an arbitrary number such that £ > 0. 

Asano achieved the optimal receiver storage, i.e., just one individual key 
at receivers in [1]. In his scheme, necessary session keys are resolved from the 
individual key using the Master key technique in the computational cost of 
0(2“ log^ n/ log a), where a is an arbitrary integer satisfying a > 1. The key 
resolution, however, requires a number of public prime numbers assigned for 
each of possible subset of n. In order to prevent from being forged assignment 
of prime number, an appropriate read-only hardware device may be required for 
storage of prime numbers. Alternatively, dynamic generation of prime number 
is also proposed in [1], the generating primes requires a probabilistic tests which 
must be costly. 

In this paper, we present a Rabin tree, which is a binary tree where every par- 
ent node can be computed by the Rabin encryption of either of child nodes. The 
Rabin tree can be used to minimize the receiver storage in the LKH method as 
well as the Asano’s scheme, but does not require any additional read-only storage 
for prime numbers. The Rabin cryptosystem, one of public-key encryption algo- 
rithm proposed by Rabin in the very early days of public-key cryptography [7], 
has a decryption issue that a decryption process can not be determined uniquely, 
i.e., there can be multiple possible plaintexts mi, m 2 for one ciphertext c such 
that 

C = E[Mi] =E[M2\. 

Our idea makes use of the property of the Rabin encryption to build key tree by 
having C, M\ and M 2 at parent, left and right node in the key tree, respectively. 
Since the Rabin encryption requires just one modular exponentiation in squaring 
plaintext, the computation for resolving a key from any of the leaf is light-weight 
process and thus is feasible in the Personal Digital Assistants with restricted 
computational power. 

The basic idea comes from the protocol presented by Nojima and Kaji [6], 
where two one-way trapdoor permutations are used to evolve both child keys 
from a parent key. Instead of one-way permutation, we use the Rabin cryptosys- 
tem which is not permutation. 

The organization of this paper is as follows. First, we review the LKH method 
and the Rabin cryptosystem. After the definition of Rabin tree is given, we exam- 
ine some properties of Rabin tree in terms of security and present a construction 
algorithm. Finally, we will apply the Rabin tree to the broadcast key distribution 
and show the estimation of performance in terms of bandwidth and computa- 
tional overhead. 
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2 Preliminary 

2.1 LKH (CS) Method 

Let N and i? be a subset of receivers or users and a subset of all revoked receivers, 
respectively. We write n = \N\ and r = |i?|. 

For simplification, we suppose n is a power of 2 and consider only complete 
binary tree in this paper. The case of k-ary tree can be extended naturally. 

The LKH method has a binary tree T of 2n nodes of key with n leaves. We 
use a binary heap notation for T, i.e., f-th node Vi has the left, the right child 
nodes and the parent node addressed by V 2 i, V 2 i+i and respectively. ^ A 

key authority independently assigns key Xi for every node Vi of T, and securely 
distribute to every user a sequence of log n keys on a path from the root node 
to the leaf, i.e., a;*, Xj/ 2 , Xi/ 4 , . . . , xi. In sending encrypted message M (digital 
contents), a sender broadcasts logn ciphertexts 

(sk), (sk), E^^^ (sk) 

in conjunction with Esk{M), where sk is a session key, m = logn and 
zi, i 2 , . . . , are the root nodes of subtree that cover all leaves in N\R. A receiver 
can decrypt at least one of m ciphertexts if and only if it belongs to N\R. 

It has been proven that the LKH method requires 0(r log(n/r)) ciphertext 
to be broadcasted, logn keys at receiver storage, and O(loglogn) computations 
for key establishment in [4]. 

2.2 Rabin Cryptosystem 

A Rabin cryptosystem is a public-key encryption algorithm based on the integer 
factorization problem [7]. 

Let p and q be safe primes for private key and N = pq he a public key. To 
encrypt a message M G Zn, compute 

C = mod N. 

To decrypt C, compute Mp and Mq such that Mp = mod p and Mq = 
mod q, and use the Chinese Remainder Theorem for four possible pairs 
(Mp,Mq), (Mp,—Mq), {—Mp,Mq) aud {—Mp,—Mq). One of four pairs must 
be the plaintext that can be identified in an appropriate manner, for instance, 
testing redundant bits embedded in plain text. 

It is not guaranteed that a decryption can not be uniquely determined in the 
Rabin cryptosystem. Letting M' and M* be 

M' = Mpqq~^ + Mqpp~^ mod N 
M* = Mpqq~^ — Mqpp~^ mod N 

^ Complete Subtree (CS) Method in [4] uses the same key tree to the LKH method but 
assumes the stateless receivers, whereas in the LKH method, receivers are statefull. 
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the four decrypted messages are M' ,—M' ,M* ,—M* . If we have a polynomial 
time F(N) algorithm A to decrypt M from C in negligible probability, we can use 
A to factorize N in 2F(A^) + 21og2 N steps as follows[7]. Pick randomly M G Zn 
and compute ciphertext C = mod N, and then apply A to decrypt C. A 
gives either M* or —M* with 1/2 probability. Then, by computing GCD{M — 
M*,N) = GCD{2Mqpp~^,pq), we have p (or q for —M*). 

The Rabin decryption is not unique, hence the Rabin digital signature has 
the restricted domain over A necessary and sufficient condition of given 
ciphertext G to be signed is that C is a quadratic residue for N, namely, G G 
QRn- Anyone who knows prime q and q is able to test if a given C is a quadratic 
residue or not by the Legendre symbol, 

^ = c(P-i)/2 = 1 (mod p), (1) 

^ = c(«-i)/2 = 1 (mod q). 

3 The Proposed Protocol 

3.1 Rabin Tree 

Let PL,PR,qL,QR be safe primes and Nl = PlQl, Nr = Pr<1r- Without loss of 
generality, let Nl < Nr. 

Rabin Tree is a (binary) tree with 2n — 1 nodes vi,V 2 , ■ ■ ■ , V 2 n, and corre- 
sponding 2n — 1 keys satisfying 

Xi G QRnl Li QRnr (2) 

for every Xi such that i < n, and 

Xi = xl, (mod Nl), (3) 

= ^2i+l (mod Nr). 

Note that Nr ^ Nr, which will be necessary for preventing collusion of receiver. 

Figure 1 illustrates a sample Rabin tree of 2n — 1 = 15 nodes. A root key x\ 
equals to the Rabin encyption of left child defined by E[x 2 ] = x\ (mod Nr) 
and also to that of right child defined by = x\ (mod Nr). Recusively, 

every parent and two child nodes are assigned so that the condition Eq. (3) 
holds. 

3.2 Constructing Rabin Tree 

A Rabin tree is built in the following probabilistic algorithm. 

1. Generate safe primes and public modulus Nr, Nr. 

2. Choose randomly root key x\ G Znl that satisfies Eq. (2). Set i= 1. 
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Fig. 1. Sample Rabin Tree: Ul, U2, . . . , U8 are receivers having individual keys indi- 
cated at the leaves, xs,xq, . . . , 2 : 15 , which will be used to generate any key on a path 
to the root. 



3. Apply Rabin decryption algorithm (e.g. one choosing p, q as the Blum num- 
bers in [7]) to Xi and let two of the decrypted messages be keys for child 
nodes, X 2 i and 

4. Test if key Xi satisfies the condition of Rabin tree, Eq. (2), using the Legendre 
symbols in Eq. (1). Repeat testing for other decrypted messages until Xi 
satisfies the condition. 

5. If none of four decrypted message can satisfy Eq. (2), the node is called 
failure node and must go back to Step 3 for choosing a new parent node key. 
(If the parent node is also failure, then go to ancestor nodes.) 

6. Output keys xi,X 2 , ■ ■ ■ , X 2 n-i that satisfy Eq. (2). 

A Rabin Tree is not uniquely determined for a given root key x\. The above 
algorithm for building Rabin tree may involve a large number of backtracking 
caused by failure nodes. The following property guarantees the availability of 
the construction algorithm. 

Theorem 1 Let and be two independent composite numbers, whose 
quadratic residues are independently determined. A. Xi € can be a root 
node of Rabin tree of 2n — 1 nodes with probability Put given by 

Prt = Pqr{1 - P^fT-^ 

where Pqr is a probability that a random element in satisfies Eq. (2) and 
Pf = ^~ Pqr- 



Proof. A number of quadratic residue of Zp is (p— l)/2 -|- 1. Hence, an element 
X € Zj-f is a, quadratic residue for N = pq with probability 

(p-l)/2 + l(g-l)/2 + l 1 ^ 1 ^ 1 ^ 1 1 

q 4 4p 4g Apq 4 



P 
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While, X must be quadratic residue for both Nl and Nr to satisfy Eq. (2). 
Thus, a probability of given x to be a successful root of Rabin tree is given as 
Pqr = ( 1 / 4 )^ = 1/16. Thereby, a probability for failure node is Pp = I — Pqr- 
Since we have four chances to choose key at child node for one parent key, a 
node can have a child with probability 1 — Pp. A Rabin tree of n leaves has n — 1 
internal nodes, for which Eq. (2) always hold. Therefore, a probability of a given 
key to be a root key of Rabin tree is Prt = Pqr{^ ~ Pr)'^~^ we have the 
theorem. □ 

Unfortunately, the probability Pqr becomes exponentially small with num- 
ber of receivers, n. For instance, Pqr = 1/16 and 1 — Pp = 0.2275. 



3.3 Security 

A single leaf node key can be used to generate all node keys on the path to the 
root node in a secure manner, that is, the Rabin encryption. On the contrary, it 
is infeasible to know child node key from a parent key as follows. 

Proposition 1 Without knowledge of prime factors of Nr and Nr, the diffi- 
culty of computing the child node key from a given parent node key is equivalent 
to that of decrypting the Rabin encryption, which is equivalent to that of integer 
factorization problem. 

Proof. It is straightforward from the definition of Rabin tree. □ 

Next property ensures the secrecy of individual key against any other legiti- 
mate receivers. 

Proposition 2 Let Xi and Xi+i be sibling nodes having the common parent. 
The knowledge of Xi implies the sibling node key Xi+i if Nr = Nr. 

Proof. Let us assume Nr = Nr. Then, Xi and Xj+i are two elements of 
{M,—M,M*,—M*} in the symbols defined in Section 2.2. When the sibling 
node keys are M and —M, it is trivial to Xi+i = —Xi mod Nr. When these are 
M and M*, by dividing Nr by GCD{M — M*,Nr) = p, we can perform Rabin 
decryption and succeed to have Xi+i from the common parent key. Other cases 
such as M and —M* are similar. Consequently, for any cases, who has Xi is able 
to know the sibling key Xj+i. □ 

The following property ensures a Rabin tree is a resilient against collusion 
attack. 

Proposition 3 Let Xj and x^+i be sibling nodes having the common parent. 
Putting together Xi and Xi+i does not reveal any of prime factor in Nr and Nr. 
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Table 1. Performance estimation in broadcast key distribution algorithms 



algorithm 


Asano[l] 


LKH(CS) [2] 


SD [4] 


Rabin Tree 


^ of ciphertext 
(bandwidth) 


r log n/r -|- 1 


r log n/r -|- 1 


2r - 1 


r log n/r -\- 1 


# of keys at re- 
ceivers storage 


1 


logn 


log^ n+log n+1 
2 


1 


^ of public 

primes at re- 
ceivers storage 


2" - 1 








computations at 
receivers 


logn 






logn 



4 Rabin Tree Based Broadcast Key Distribution 

In this section, we present an application of the Rabin tree to broadcast key 
distribution. In this model, we have a key distribution authority A, a contents 
provider P, and a set of receivers JV. 

1. Authority A executes the construction algorithm in Section 3.2 to generates 
a Rabin tree T of 2n — 1 nodes. Then, A distribute n keys at the leaves, 
Xn, Xn+ 1 , ■ • • , X 2 n-i, to each of receivers in iV in a secure channel. 

2. In the same way to the LKH method, provider P determines m subtree of 
T, characterized by the root nodes ii,i 2 , . ■ . ,im that contain all legitimate 
receivers in N\R, encrypts message M with a securely generated symmetric 
key sfc, and broadcasts m + 1 ciphertexts 

{sk),E^.^ (sk ), . . . , {sk),Esk{M). 

3. A receivers with key Xj searches the path from the leaf to the root key, until 
it finds out a node that matches any of nodes ii,.. . ,im in the broadcast 
message. The parent key 2j of node j can be obtained from Xj by 

J x'j mod Nl if j is even, 

\ Xj mod Nr otherwise, 

and the receivers goes on searching T up to the root. 

The proposed method requires the same size of ciphertext to the LKH 
method, a single (iV^ size) individual key Xi, and computation overhead of 
O(logn) time to extract the session key sk. 

4.1 Estimation 

Table 1 illustrates the performance evaluation of the proposed method in the 
comparison with some of the existing protocols. Notice that the evaluation is 
based on [1] . We show only in binary tree case though [1] examines more generally 
fc-ary tree case. In the table, the number of public primes is of all possible prime 
numbers necessary for resolving node key. 




Rabin Tree and Its Application to Group Key Distribution 391 



5 Conclusion 

We present a Rabin tree and its application to broadcast key distribution. We 
have shown that the building Rabin tree involves a large number of backtracking 
and thus is a time-consuming process, which is, however, executed just one time 
at the initialization step and hence it is okay if it takes a long time to run. The 
future issues include a more efficient algorithm for the Rabin tree construction 
and a reduction in ciphertext. 
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Abstract. Real-time traffic requires rapid corrective action to counteract the 
negative effect of network faults. In this paper we propose such a detection and 
rerouting scheme for VoIP traffic. We use an RTP/RTCP-based detection 
method to quickly detect network problems. Subsequent packets can then be re- 
routed using an overlay network approach that avoids failed links and paths 
with inadequate QoS. This rapid detection and rerouting minimizes the impact 
of network failures to real-time applications such as VoIP. In this paper, we 
present the main ideas behind these proposals along with some implementation 
considerations. 



1 Introduction 

Voice over IP (VoIP) applications require acceptable quality-of-service (QoS), such 
as low delay, jitter, and loss, from the underlying network infrastructure. This in- 
cludes high availability and reliability. Therefore, it is necessary to find ways to im- 
prove or overcome the detrimental delays, jitter, and losses inherently associated with 
Internet Protocol (IP) networks. Techniques to improve the reliability and QoS char- 
acteristics of IP networks typically involve some combination of network monitoring, 
failure detection, fast reroutes (e.g., such as MPLS [1]). 

With network monitoring, it is possible to provide voice quality feedback to a user 
participating in a VoIP session. Also, various metrics related to RTP (Real-Time 
Transport Protocol) traffic might be computed by an endpoint application and then 
presented to the user. There are two broad categories of network monitors. The first 
type, distributed monitoring, includes measurement instrumentation that is either 
distributed at different points in the network [3] [4] [5] or placed at the endpoints of 
the end-to-end path [6] [7] [8] [9]. Alternatively, network conditions can be moni- 
tored using only one centralized observation point. This centralized monitoring makes 
installation easier, but it takes longer to detect failures. 

There are two common methods of detecting failures: (i) actively injecting probes 
into the network, and (ii) passively sniffing existing network traffic. Many network 
measurement tools send periodic round-trip echo probes to a distant host, which then 
responds back to the sender. These probes are usually in the form of ICMP Echo 
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packets - they are called NetDyn probes in [8] [10], Netnow probes in [1 1], and Fping 
[12] probes in Imeter [13]. 

Several methods have also been proposed for rapid rerouting of packets. For in- 
stance, at the data link layer there are techniques, such as the Rapid Spanning Tree 
protocol [14]. Such technology, however, is useful only for local traffic. At higher 
levels, there are other techniques, e.g., ones based on MPLS, and multi-home routing 
[15]. Some of the methods, though, require significant changes to existing Internet 
equipment and have not been deployed or implemented. 

In this paper, we describe ways to help improve VoIP reliability and QoS by 
equipping network devices (e.g., media gateways, edge routers, or endpoints) with the 
capability to: (i) rapidly detect failures and (ii) rapidly reroute traffic using overlay 
networks [2], Rapid detection and rerouting is essential for real-time applications 
like VoIP. To make sure VoIP packets are sent to the most suitable network paths, 
network conditions must be monitored. Often, though, gateways do not have any 
control over the IP network devices nor the routes within the IP network. Despite this 
harsh constraint, we show that overlay technologies can play an important part in 
improving VoIP reliability and QoS by rapidly relaying packets through intermediate 
(overlay) nodes. 

For rapid failure detection and quick corrective actions, we propose injecting a 
“keep-alive” signal into VoIP streams. Once network degradation (i.e., network fail- 
ure or deteriorating QoS conditions) is detected, traffic is quickly rerouted, perhaps 
helping prevent bigger network outages. Research shows that for 30% to 80% of 
network paths, there exist some alternative paths with significantly superior QoS [15]. 
Now the question is how to reroute traffic to better paths once failures are detected. In 
this paper, we describe some possible ways to reroute VoIP packets after failures and 
degradation. Normally, rerouting requires routers to change their routing tables. Given 
the constraints mentioned above, though, the key elements of our method are: (i) IP 
over IP tunneling for distributed implementations, and/or (ii) creation of new sessions 
for centralized implementations. More details of the tunneling and session creation 
will be given later. The tunneling mechanism can be added to existing IP phones 
(endpoints in a distributed method), or placed in a network device to be shared by 
many endpoints or gateways or even routers. The session creation method can be 
implemented on a call server like the centralized monitoring method. Overall failure 
detection and rerouting can even be implemented as a stand-alone box. We use the 
stand-alone case as the focus of our discussion in this paper. Although we primarily 
describe the new network design in the context of VoIP network failure detection and 
packet rerouting, the general concept can also be used in other scenarios. 

In Section 2, we describe in more detail the new techniques and implementation is- 
sues. In Section 3, we discuss some bandwidth and cost tradeoffs. Our conclusions are 
in Section 4. 



2 VoIP Failure Detection and Recovery Method 

In this section, we describe in more detail a network design in which failures in a 
converged network are rapidly detected and then packets are rerouted to a better path 
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in a timely manner. Conventional priority-based routing uses DiffServ bits to set 
classes of service so that packets receive prioritized treatment by routers. In contrast, 
the VoIP-aware routing we describe deals specifically with VoIP packets and uses 
non-local information to choose the best paths for transmitting the packets. Informa- 
tion about the selected paths is passed to the gateways by, for example, VoIP signal- 
ing or by parsing special bits in the packets. The proposed VoIP-aware routing solu- 
tion includes two main and somewhat independent tasks: (i) monitoring the condition 
of the network to detect network-path failures and degradation, and (ii) using the 
network information to switch packets to a better path. The path switching can be 
performed for live, in-progress calls or just in the selection of paths for future calls. If 
the better path is used only for future calls (i.e., as new calls are set up), then the in- 
progress calls continue to suffer the worse performance (or fault) on the old path. 

To appropriately route VoIP packets, it is first necessary to monitor the network 
conditions and detect network-path failures and deteriorating QoS conditions (e.g., 
increasing packet delays, jitter, or loss). Many research papers, prototypes, and prod- 
ucts offer ways to monitor network conditions. 

One way to monitor conditions is to simply inject packets (with timestamps) and 
accurately measure their loss, delay, and jitter along various paths (by comparing the 
transmit and receive times). In this approach, endpoints are added to the network (all 
synchronized to a common clock so that they can make accurate delay measurements) 
or appropriate software is added to the gateways. The endpoints (or gateways) send 
synthesized RTP packets among themselves to collect the delay, jitter, and loss in- 
formation. This is done at frequent, regular intervals so to maintain up-to-date status 
of all network paths (not just those paths with active calls). If path problems develop, 
information about other paths is needed so that appropriate rerouting actions can be 
rapidly executed. 

Alternatively, often it is not necessary to exactly compute the network parameters 
when comparing candidate paths for VoIP-aware routing. Instead, it is sufficient to 
compare the delay, jitter, and loss factors of various paths without calculating their 
actual values. This observation greatly simplifies network monitoring - removing 
many of the hard problems (e.g., synchronization) of exact measurements. It also 
quickens the rerouting decision process. 

Suppose low-bandwidth keep-alive packets are duplicated at the source (e.g., gate- 
way) and simultaneously transmitted over multiple paths to a destination (again, e.g., 
gateway). Then the destination only needs to observe and measure the differential 
delays between receive times of the various copies to determine which paths are cur- 
rently best for VoIP. The above comparisons can be made for each pair of gateways 
of interest. Using these periodic keep-alive packets, comparisons of the delay, jitter, 
and loss factors from a gateway to the other gateways are made and the best paths 
recorded. The gateways of interest may be defined as the ones that are one hop away 
in the WAN connection; i.e., gateways whose IP routing path has one WAN connec- 
tion between them. 

Once the network paths have been monitored and evaluated/compared, the second 
step in VoIP-aware routing is to use the information to switch VoIP packets to the 
best path. We consider solutions to the switching problem under two different scenar- 
ios: (A) gateways are not situated at the enterprise edge and do not have routing func- 
tions, and (B) gateways sit at the enterprise edge as BGP peers. 
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In the first scenario (i.e., gateways are not situated at the enterprise edge), one so- 
lution is to use tunneling. If the best path selected is the direct path between two gate- 
ways, Gx and Gy, the original traffic will be sent and no encapsulation of packets is 
required. Otherwise, we assume that a set of other gateways, e.g., gl, g2, ..., gn, have 
to participate in the best path. IP headers will be added to the packets so that the traf- 
fic can travel from Gx to gl, gl to g2, etc., and eventually gn to Gy. Gx only needs to 
make sure that the packets are forwarded to gl. The rest of the best path will be han- 
dled automatically by the gateways g2, ... , gn, because they also keep their own best 
path tables. In effect, the gateways provide an overlay network. For example, gl 
should know that the next gateway in the best path is g2. This distributed intelligent 
solution is consistent with the IP network fundamentals. 

In the second scenario, each gateway sits at an enterprise edge and participates as 
one BGP peer. The gateways are preloaded with route-control software and are in- 
stalled behind the firewall at a multi-homed site as a peer to the Internet routers. The 
gateways test to determine how well each available ISP is performing and picks the 
best route based on that performance, but also by factoring how much it costs to use 
each route. Because it is a routing peer, the device can change the Internet routers’ 
tables using BGP so the routers direct traffic to the best ISP. 

In the case when the gateway uses the PSTN connection as the only backup for IP 
network failure, the solution can be simplified. 



2.1 Description of the Technique 

The new method, which can be either distributed or centralized, makes use of the 
existing VoIP infrastructure. The sender increases the number of RTCP packets when 
the number of the RTP packet decreases. The receiver detects the failures if neither 
RTP nor RTCP packets are received and notifies the users who are involved in the 
sessions to change the routing path. 

The failure detection and rerouting box (FDR) can be placed at any of four places: 
(i) inside the IP phone application, (ii) inside the RTP/RTCP stack of the IP phone, 
(iii) inside the edge device that connects the enterprise network to the internet, or (iv) 
as an independent box next to one or more IP phones or an edge device controlled by 
a call server. Missing RTP and RTCP packets can be monitored at the endpoint (i.e., 
distributed) or the call server (i.e., centralized). The centralized method uses the call 
server to detect existing calling sessions and their corresponding RTP/RTCP packets. 
It informs the end-user device of network failures after detecting missing RTP/RTCP 
packets. It also sets up new sessions using new routing paths. The distributed method 
uses an extension to the end-user IP-phone or gateways to adjust the RTP/RTCP 
packet transmission rate and detect missing packets. The frequency of RTCP packets 
increases as the number of RTP packets decreases. The affected IP-phone or gateways 
also control rerouting of packets. 
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Figure 1 shows the setting of our method. It includes network components such as 
IP phones, gateways and servers. The FDR box can be implemented inside these 
devices or as an independent box. 




Fig. 1. A general architecture of failure detection and rerouting method 



For a distributed implementation involving Phone 1 and Phone2, we now briefly 
describe a typical failure detection and rerouting scenario. First, the FDR boxes of 
Phonel and Phone2 inject some (short) RTCP packets into the session stream when- 
ever no RTP packets are sent (i.e., during silence periods). Both FDR boxes monitor 
the incoming RTP and RTCP packets from the other party. If one box does not re- 
ceive either RTP or RTCP packets, it detects a network failure and attempts to notify 
the other party. The notification can be sent simultaneously through multiple paths to 
ensure the other party will receive the failure notification and reroute subsequent 
packets. For example, if Phonel detects a failure, it notifies Phone2 that a failure 
occurred on the path from Phone2 to Phonel by sending two simultaneous packets: 
one directly from Location 1 to Location 2 and the other from Location 1 through 
Location 3 to Location 2. The question of how to achieve this second path will be 
discussed later. If Phone2 also detects a failure, then a failure is detected that affects 
the traffic in both directions. Once Phonel receives the notification, it reroutes all the 
subsequent packets from Location 1 through Location 3 to Location 2. 

We use a tunneling method to control reroute of packets in the distributed method. 
Consider the situation as described previously. Once Phonel is notified that a failure 
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is detected on the path from Phonel to Phone2, the FDR box of Phonel sends a mes- 
sage to inform an FDR in Location 3 (FDR-three) of the upcoming special packet 
stream. Phonel then wraps all packets inside another IP packets with the header 
pointing to FDR-three. When FDR-three receives such packets, it takes off one header 
and forwards the rest of the packet. 

For a centralized implementation, a typical scenario would also make use of the 
edge device and its corresponding call server. Again, the FDR box can be imple- 
mented: (i) inside the call server application, (ii) inside the call server RTP/RTCP 
stack, or (iii) as an independent box next to the edge device. When a call between 
Phonel and Phone2 occurs, they both register their sessions with the call server. The 
server monitors physical, link and IP layer connectivity, as well as RTP and RTCP 
traffic between Phonel and Phone2. When it detects network failures, it notifies all 
affected sessions and endpoints. It creates two new sessions for the traffic between 
Phonel and Phone2: one from Phonel to an FDR in Location 3 (FDR-three) and the 
other from FDR-three to Phone2. Even if the affected endpoints are not active, failure 
detection can prevent subsequent calls from using failed paths. 

2.2 Some Benefits and Applications of the Technique 

Benefits of the new technique include reduced failure detection time, less impact of 
failures and improved network reliability for VoIP traffic. Its applications include 
rerouting to better QoS paths for phones, gateways or other network devices. 

Early failure detection allows prompt reaction before it causes a bigger network 
breakdown. In addition, failure information can be used to inform a “smart router” of 
failures on a certain path so that a different path will be selected for future transmis- 
sions. 

Multiple path reservation for VoIP improves QoS of VoIP traffic without increas- 
ing costs, because it does not require changes to existing routing devices. In fact, FDR 
boxes can be easily plugged in anywhere in the network and distributed as freeware 
because of their low cost. Furthermore, since it becomes harder for intruders to guess 
the actual path of the voice traffic, this new technique also improves the security of 
VoIP traffic. 

2.3 Some Implementation Issues 

Figure 2 shows an example of the distributed method implemented at the end-user 
applications. A typical FDR box includes three major components: a RTP/RTCP 
sender for sending additional RTCP packets, a failure detector for detecting and noti- 
fying failures to the other party, and a rerouting handler that either wraps packets in 
an additional header or takes off additional headers before forwarding. The 
RTP/RTCP sender is designed to increase the number of RTCP packets when the 
number of RTP packet decreases, most likely during silence suppression. If neither 
RTP nor RTCP packets are received, the failure detector detects an incoming network 
failure. It sends a feedback to the RTP/RTCP sender to transmit an additional RTCP 
packet to the other party. Once the other party receives such a failure notification 
RTCP packet, it can notify its rerouting handler to select a different path. If it does not 
receive the notification, a two-way network failure is detected and both rerouting 
andler will be notified to wrap the packets and send to a different path. 
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Fig. 2. Adjustable RTCP failure detection and rerouting 

Important implementation issues of the new technique include coordinating the 
number of RTP and RTCP packets sent by the RTP/RTCP sender component, the 
failure detection criteria when none of the packets are received within a certain time 
interval, and how to effectively catch packets for wrapping before forwarding into the 
network. 

The first question is when and how to inject RTCP packets into existing RTP 
streams. 

The second question is the number of missing packets before a failure can be de- 
tected. We implemented this factor as a parameter of our software. Based on our 
experiment and analysis results, we found that a factor of three, i.e. three consecutive 
packets missing, is sufficient to detect a network failure. [16] 

The third question is how to catch and wrap packets. Fig. 3 shows the location and 
flow of the reroute handler within a Linux kernel. It intercepts packets and wraps or 
unwraps them before sending them to IP layer for transmission. 

When the voice packets pass through the reroute handling device, traditionally, the 
network card sends the packets to the IP layer for routing and forwarding. In our 
design, instead of the IP layer, the packets are sent to the packet interceptor, which 
forwards packets to the reroute handler. 

The packet interceptor is very important. First, it needs to communicate with the 
failure detector to find out whether rerouting is necessary. Second, when it is in- 
formed that a failure is detected, it needs to grab the subsequent packets and make 
sure that the packets are not forwarded to the IP layer. Third, it must construct and 
forward packets to the reroute handler in a timely manner, i.e. its delay cannot exceed 
the delay of constructing a packet at the IP layer. 
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Fig. 3. Location and flow of Reroute Handler 

In the tunneling method, the reroute handler’s main function is to quickly wrap and 
unwrap packets. In order for the reroute handler to decide which IP address to use in 
the wrapper, it must keep a list of available alternative routes. The alternative route 
information is stored in the “table” component of Fig. 3. 

The content of the “table” is specified by the topology of the overlay network. If 
the rerouting device knows some other rerouting devices in the overlay network, it 
can select which one to select for packet relay. In fact, it can have a prioritized list of 
next-hop rerouting devices. The prioritization can be based on the QoS information 
collected during the failure detection. 

Consider again the example in Fig. 1 when the link between Location 1 and 2 
breaks. The rerouting device at Location 1 intercepts all packets to be sent to Location 
2, and wraps them in an IP header that points to Location 3. The IP header of 
Location 3 is stored in the rerouting table of the rerouting device at Location 1 . When 
the rerouting device at Location 3 receives packets from Location 2, it unwraps the 
packets and forwards them to their original destination, which is Location 2 in this 
example. 

Another possible implementation is to let the call session manager of a server keep 
track of the RTP and RTCP traffic on each session and thereby detect failures. An 
active end-user notification can be made fi'om a network device (e.g., an edge router 
or media gateway) to an application that the user is interacting with, which then sig- 
nals this notification (probably processed by rules) to the end device. If these devices 
determine that a network problem has occurred (e.g., from network or reconfiguration 
messages that they receive), they can send notifications about it to end devices to set 
up new sessions on different paths. Alternatively, if network devices maintain infor- 
mation about open RTP sessions (which may require some capability like wire-speed 
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string matching), then they can selectively reroute traffic from only users in those 
sessions (perhaps by introducing packets into the RTCP sessions). In both cases, since 
a network problem (e.g., reconfiguration) may not necessarily affect a particular 
session, end applications must use such network notifications in conjunction with 
other factors (like not receiving packets for some time period) when deciding rerout- 
ing to a different path. 



3 Bandwidth and Cost Tradeoffs/Issues 

In this section, we study two kinds of tradeoffs. First, the tradeoff between the rate of 
inserted RTCP traffic and the time it takes to detect failures. The second tradeoff is 
the overhead of additional IP headers versus the forwarding handling speed. 

For the first one, we assume that the RTCP rate is never bigger than the RTP rate 
because the overhead of the failure detection should not be larger than 50% of the 
traffic. In general, the more RTCP packets are injected the faster the failure detection 
will be, however with higher overhead. 

The additional “keep alive” RTCP packets can be sent as SDES packets [17, Sec- 
tion 6.4], with a source count (SC) of 0. The size of the data portion of such a packet 
is only 4 octets (excluding the 8 octet UDP and 20 octet IP headers). In comparison, 
the sender report RTCP packet is larger and requires statistics collection. Application- 
defined RTCP packets [17, Section 6.6] may also be considered for this purpose. The 
RTP RFC [17, Section 6.2] also recommends against sending too many RTCP pack- 
ets. However, the spirit in which this recommendation is made is adhered to in our 
proposal, since the additional RTCP packets are sent only when regular RTP packets 
are not (during silence suppression, for example), and hence any prenegotiated band- 
width limitations should not be exceeded. In particular, note that the SDES packet is 
smaller than an RTP packet, given the mandatory 12-octet RTP header. 

In [16], we derive a formula of RTCP rate as related to RTP rate, failure detection 
time and overhead, which we hope can be used to derive the optimal RTCP rate in the 
actual implementation. 

For the study of the second tradeoff between speed and bandwidth, we use the fol- 
lowing formula. Each IP header includes 20 octets. Suppose the payload of a certain 
codec is P bytes. Then the bandwidth overhead is: 20/(20+20+P). For codec G.711, a 
payload of 160 octets, the overhead is 10%. If this is too much, then we can apply a 
compression algorithm to reduce the bandwidth overhead. However the tradeoff is 
that packets will take a longer time to be forwarded because all packets need to be 
uncompressed before forwarding. 

4 Conclusions 

In this paper, we proposed a method for rapid network failure detection and rerouting 
for VoIP traffic. It reduces failure detection time by coordinating the sending fre- 
quency of RTP and RTCP packets. It eliminates the impact of network failures and 
degradation by rerouting to better paths. It improves the traffic security because of the 
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uncertainty in path selection. There are three key aspects to this new design. First, it 
detects failures rapidly with low cost. Second, it recognizes that for about 30% to 
80% of network routing paths, there exists at least one alternative path that has supe- 
rior QoS that the default one. Our technique exploits these extra paths to improve 
network reliability without additional cost. Third, it also exploits the additional paths 
to improve traffic security. 

In the future, we plan to experimentally determine various failure detection speeds 
and rerouting capacities. Also, it would be interesting to measure the restoration times 
in real networks and compare with the detection and rerouting times attainable with 
our design. 
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Abstract. The current trend of software quality assurance has been heavily 
considered in various software projects due to the increasing complexity of either 
the system scale or the diversity of technologies. Meanwhile, there exist many 
international standards and development guides such as DO-178B, EN50128, 
ISO/IEC 12207, ISO/IEC 15504, etc., that may be referenced during project 
development in different application domains. In this paper, we investigate the 
issue of process improvement in a software-intensive organization and establish 
a quality-enhanced system scheme for emphasizing software verification and 
validation (V&V) in the aspects of capability levels and its integrity levels. 
Considering the practical resources at a middle-scale software-intensive 
organization in Taiwan area, we propose a feasible, efficient and economical 
roadmap for software process improvement, no matter the companies is ISO 
9001 :2000 registered or not yet, which then provides a shortcut to enhance V&V 
tasks for the ISO-9001 registered software organizations. 

Keywords: Software Verification and Validation, Software Process 

Improvement (SPI), Software Capability Level, Software Integrity Level, CMMI, 

ISO 9001:2000 

1 Introduction 

In literature, quality enhancement of software systems [12] has been widely 
investigated. Essentially software validation is to ensure that a software system meets 
the user’s requirements. To satisfy the objectives of the software validation process, 
both static and dynamic techniques of system checking and analysis are usually 
employed. However, static techniques can only check the correspondence between 
software and its specification, i.e., the so-called verification process. They cannot 
demonstrate sufficiently that the software is operationally valid. 

Nowadays, there are numerous maturity models, standards, methodologies, and 
guidelines that can help an organization improve the way it does business. For instance, 
DO-178B provides guidance for determining that the software aspects of airborne 
systems and equipment comply with airworthiness requirements, EN50128 gives 
guidelines for software development in the field of railway signaling systems for 
preventing the introduction of software faults, ISO/IEC 15504 [10] establishes a 
quantitative standard in the area of software process assessment, etc. Among them, we 
investigate the full set of CMMI models released by the Software Engineering Institute 
(SEI) in January 2002 in this research. 
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The Capability Maturity Model Integration (CMMI) [7, 17, 18] that was 
developed by the Software Engineering Institute in Camegie-Mellon University has 
been proven that it is efficient for achieving product and process improvement and 
widely accepted and implemented in the IT area. However, it usually requires lots of 
project budget and effort for an organization to implement the CMMI framework for 
reaching a higher maturity level. It may become a bottleneck on promoting CMMI to 
middle-scale software organizations, which popularly exist in Taiwan. Accordingly, we 
propose a software quality-enhanced framework for the middle-scale, ISO 9001:2000 
[II, 13] registered software organization, in performing the required V&V tasks from 
the perspective of the Practice Area (PA) of the CMMI model, by the fact that most of 
current information technology companies in Taiwan area have been registered ISO 
9000. 

In this paper, after briefly discussing the primary difference between the 
continuous representation and the staged representation of a CMMI model, we suggest 
a framework of the continuous representation model for a middle-scale organization 
since we conclude that it is more suitable to promote and implement for reaching higher 
capability level, with fewer budget and human resource to assign him or her software 
process improvement. 

In referring with the related international standards, such as IEEE Standard 
1012-1998-Standards for Software Verification and Validation Plans [8], 
730-2002-Standards for Software Quality Assurance Plans, ISO/IEC 12207-Software 
Life Cycle Processes [9, 20], we summarize the minimal V&V tasks and use a 
software-integrity-level scheme based upon software intended use and quantify 
application of the system to criticality. By integrating the software integrity level 
involves capability level in the continuous representation CMMI model; these minimal 
V&V tasks establish a stepwise roadmap for capability level from level 1 to level 4. The 
benefit of our proposed roadmap of improving software process within a middle-scale 
organization provides an effective, efficiency and economical approach no matter the 
companies is ISO 9001:2000 registered or not yet. 



2 Concerned Standards 

Essentially, the full set of CMMI models released by SEI in January 2002 aims to 
provide guidance for improving an organization’s processes and ability to manage the 
development, acquisition, and maintenance of his software products. Furthermore, 
CMMI model may be useful for appraising its organizational maturity or process area 
capability, establishing priorities for improvement, and implementing these 
improvements [21]. 

In CMMI models, process areas describe key aspects of such processes as 
requirements management, configuration management, verification, validation, and 
many others. Specifically, a process area (PA) provides a list of the required practices to 
achieve its intended goals, but it does not describe how an effective process is executed, 
e.g., entrance and exit criteria, roles of participants, resources. 

Currently there are two types of CMMI model representations [4]: staged and 
continuous. The staged representation uses predefined sets of process areas to define an 
improvement path for an organization. This improvement path is described by the 
maturity level. A maturity level is a well-defined evolutionary plateau toward achieving 
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improved organizational processes. Oppositely, the continuous representation allows 
an organization to select a specific process area and improve relative to it. This 
representation uses capability levels to characterize improvement relative to an 
individual process area. 

Because each representation has advantages over the other, one has to decide 
which CMMI model best fits his or her organization's process improvement needs. 
Although SEl employs equivalent staging to compare results from both representations, 
it is somewhat difficult to make an optimal decision for an organization. 

On the other hand, the ISO 9000 family of international quality management 
standards and guidelines has become an international reference basis for establishing 
quality management systems. In particular, ISO 9001 :2000 specifies requirements for a 
quality management system where an organization needs to demonstrate its ability to 
consistently produce product that meets customer and applicable regulatory 
requirements, while aims to enhance customer satisfaction through the processes for 
continual improvement of the system [5,6]. 

Since their popularity of the ISO 9000 family and CMMI models, relationships 
between the two models have been studied. Historically, Paulk implements the ISO 
9001 compares with CMM 18 Key Process Areas [15], describe the relationship 
between ISO 9001 and CMM. Due to the CMM Integration published by SEI in 2002, 
its content of models is quite different with that of CMM [19]. In this paper, some 
mapping results of the ISO 9001:2000 clauses to their corresponding CMMI PAs will 
be discussed in the subsequent section. 

3 The Association of CMMI Continuous Representations with 
ISO 9001 



Since late 2000, significant interest has been seen in certification and registration for 
many organizations under ISO 9001:2000 and in transition from the CMM to the 
CMMI. In contrast to ISO 900 1 :2000 that can be applied to any organization regardless 
of its field in which it operates, the CMMI specifically focuses on organizations that 
develop products and systems containing software. While the CMMI provides a road 
map for achieving process capability or maturity levels [3, 16], ISO requires all of its 
requirements to be fulfilled before certification can be issued. Furthermore, both ISO 
and the CMMI are based on principles of systems engineering and a process approach. 
In the following, we strive to compare ISO requirements to CMMI PAs and specific 
practices and depict their corresponding mappings. 



3.1 Mapping CMMI Process Areas to ISO 9001 :2000 Clauses 

As stated in the CMMI technical report [17], the Requirements Management (REQM) 
PA essentially maintains the project requirements. It describes activities for obtaining 
and controlling requirement changes and ensuring that other relevant plans and data are 
kept current. Furthermore, it provides traceability of requirements from customer to 
product, to product component. After analyzing the corresponding interpretations, the 
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goals of the REQM PA may be equally performed by the clauses 4.1 and 4.2 of ISO 
9001, as shown in the first row on Table 1. Similarly, according to [17], the Project 
Planning (PP) PA involves the various tasks such as: developing the project plan, 
interacting with stakeholders appropriately, getting commitment to the plan and 
maintaining the plan. By the interpretation rationale of reaching the same purposes, we 
conclude that the tasks of the PP PA correspond to the clauses 4.1, 5.1, 5.4 and 7.1 of 
ISO 9001, as listed on the second row on the same table. 

On the same way, we summarize the result for all PAs of the CMMI model in 
Table 1 after careful study, the cross mappings from CMMI process areas at different 
capability levels to the corresponding clauses under Sections 5-8 in ISO 9001:2000. 
Naturally, the illustrated mapping between ISO 9001:2000 and CMMI seems a 
subjective association. Actually, we have concluded this result based on our several 
experimental case studies, which have tried to verify its correctness and evaluate its 
contribution. Due to the space limitation, the detailed information will not repeat here. 

Generally, the ISO 900 1 :2000 allows an organization more flexibility in the way it 
chooses to document its quality management system. ISO 9001 does not contain any 
explicit requirements for the software development process, because it was originally 
designed for application in a broad number of topics, including development of 
products, systems or services. In a sense, this "flexibility" makes ISO 9001 quite 
difficult to implement. CMMI add value and detail to ISO 9001:2000 clause 
descriptions [14]. 

3.2 CMMI Process Areas of V & V Tasks 

The verification (VER) PA ensures that selected work products meet the specified 
requirements. VER is generally an incremental process, starting with 
product-component verification and usually concluding with verification of fully 
assembled products. The validation (VAL) PA incrementally validates products against 
the customer’s needs and may be performed in the operational environment or a 
simulated operational environment. 

To accomplish the generic goals of the VER and VAL PAs in a software 
development process, some related process areas including RD, REQM, and TS as 
listed in [4], are required to establish the baseline infrastructure. The RD PA is needed 
for the generation and development of customer, product, and product-component 
requirements in order to validate requirements, while the REQM PA aids for managing 
requirements. Moreover, the TS PA may provide assistance to transform requirements 
into product specifications and for corrective action when validation issues are 
identified that affect the product or product-component design. 

Thus, putting emphasis on the complete effort, these three PAs will be regarded as, 
in this paper, the primary process areas, in addition to the original VER and VAL PAs, 
as illustrated in Fig. 1. 

In addition, while in software project development, both the PP and MA PAs are 
usually the key to successful implementation of a variety of process areas [1]. 
Furthermore, from the perspective of IEEE Std 1012-1998, some other PAs such as 
REQM, PPQA, CM, PI, PMC are essentially needed in implementation the V&V tasks. 
Accordingly, the auxiliary process areas for implementing V&V tasks include PP, MA, 
PPQA, CM, PI and PMC, as shown in Fig. 2. In addition to the primary process areas as 
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described, these auxiliary process areas are further included to suggest a requisite 
framework for an organization who has not yet obtained ISO 9001 registration but 
desires to improve her software improvement in the interest of V & V areas. 




Process Area 



Fig. 1. The primary process areas for V&V tasks. 




Fig. 2. The auxiliary process areas for VcfeV tasks. 



4 A Framework of Minimum V&V Tasks 

In practice, software systems exhibit different levels of criticality based upon their 
intended purposes and cost impact due to their system failures. To consider the 
trade-offs between the criticality levels and the paid effort, software-development 
organizations may strategically choose a lower integrity level to save the development 
effort [2], if its cost impact, once it is happened, is acceptable or negligible. 
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More detailed, software integrity levels denote a range of software criticality 
values that are necessary to maintain risks within an acceptable limit. These software 
quality metrics include safety, security, software complexity, performance, reliability, 
correctness or other characteristics. Generally, critical and high-integrity software 
typically requires a larger set and more rigorous application of V&V tasks. The IEEE 
Std 1012-1998 [8] defines four software integrity levels (SIL) as illustrated in Table 1. 
In the following, we employ this four-level software integrity scheme as a method to 
define the minimum V&V tasks. 



Table 1. Four-level scheme of software integrity level. 



SIL 


Critically 


Consequence 


Description 


4 


High 


Catastrophic 


Lx)ss of human life, complete mission failure, loss of system security 
and safety, or extensive financial or social loss. 


3 


Major 


Critical 


Major and permanent injury, partial loss of mission, major system 
damage, or major financial or social loss. 


2 


Moderate 


Marginal 


Severe injury or illness, degradation of secondary mission, or some 
financial or social loss. 


1 


Low 


Negligible 


Minor injury or illness, minor impact on system performance, or 
operator inconvenience. 



To identify the minimum V&V tasks that apply to the different integrity-level 
software systems, the software developers may refer to the IEEE Std 1012-1998 for the 
complete list. In this paper, we are limited ourselves to the non-critical commercial 
applications that exist the most popular in the medium-scale organizations in Taiwan. 
Thus, for the system in non-critical uses, Table 2 delineates the minimal V&V tasks 
assigned to integrity level 2, in correspondence with the ISO 9001:2000 clauses and 
CMMI PAs and capability level (CL) as well. 

In that table, we accept the general framework on dividing the whole software life 
cycle (SEC) into 5 periods: concept phase, requirement phase, design phase, 
implementation phase and test phase. For the concept phase, the minimal V&V tasks 
include two tasks, i.e., the concept-documentation evaluation and criticality analysis, 
which derive from the IEEE Std 1012-1998 [8]. Each task is further corresponding to 
the associate clauses of ISO 9001. The final column of the table shows the 
corresponding PA and its capability level that will be attained after the concerned task is 
performed. For instance, the concept-documentation evaluation task in [8] will perform 
the same effect as both the REQA PA at the capability-level 3 and the PP PA at the 
capability-level 2 as well. In terms of CMMI terminology, a capability-level 2 process 
is characterized as a “managed process,” while a capability-level 3 as a “defined 
process.” 

A critical distinction between a managed process and a defined process is the 
scope of application of the process descriptions, standards, and procedures. For a 
managed process, the process descriptions, standards, and procedures are applicable to 
a particular project, group, or organizational function. As a result, the managed 
processes for two projects within the same organization may be very different. Whereas, 
at the defined capability level, the organization is interested in deploying standard 
processes that govern all related projects. 
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More detailed, the essential process elements for each minimal V&V tasks are 
investigated and summarized in Table 2 in order to be ready for implementation. 



Table 2. Minimal V&V tasks for the software with integrity level 2. 



SLC Phase 


Minimal 
V&V Tasks 


ISO 9001:2000 Clauses 


associated 
CMMI PAs & CL 


Concept 


Concept 

Documentation 

evaluation 


7. 1 - Planning of product realization 

7.2. 1- Determination of requirements related to 
the product 


REQM (CL4) 
PP (CL3) 


Criticality Analysis 


7.2.2-Review of requirements related to the 
product 


REQM (CL4) 


Requirement 


Acceptance V&V test 
plan generation and 
verification 


5.4.1- Quality objective 

5.4.2- Quality management system planning 

7.2.1- Determination of requirements related to 
the product 

7.5.3- Identification and Traceability 


PPQA(CL4) 
PP (CL3) 


Criticality Analysis 


7.3.2- Design and development inputs 

7. 3. 3 - Design and development outputs 


RD(CL4) 


Design 


Component V&V test 
plan generation and 
verification 


7.3. 1 -Design and development planning 
7.3.5-Design and development verification 


VER (CL4) 
PP (CL3) 
RD(CL4) 


Criticality Analysis 


7.3.4-Design and development review 


VER(CL4) 


Implementation 


Component V&V test 
execution and 
verification 


7.3. 1 -Design and development planning 
7.3.6-Design and Development Validation 


VER (CL4) 
CM (CL4) 
PI(CL3) 


Criticality Analysis 


7.3.7-Control of design and development 
changes 


CM (CIA) 


Test 


Acceptance V&V test 
execution and 
verification 


7.5.1 -Control of production and service provision 


PMC (CL3) 
VER (CL4) 


Acceptance V&V test 
procedure generation 
and verification 


5.4.1 - Quality objectives 

5.4. 2- Quality management system planning 

7.5.2- Validation of process for production and 
service provision 


VER (CL4) 
VAL (CL3) 
PP (CL3) 



5 Conservative Roadmap to Process Improvement 

Suppose a software organization wants to enhance her software process improvement 
by the approach of implementing CMMI model, how does she choose a suitable 
roadmap to reach her goal? In the following, we propose a practical and systematic 
sequence for those middle-scale software organizations, from the perspective of 
minimum V &V effort at the second integrity level to save the software development 
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cost. At this stage, we set the goal of software improvement from none to the 
capability-level 3, just trying to make the proposed roadmap is easily implemented for 
an organization that starts to employ the philosophy of CMMI model. 

In general, the degree of V&V effort required for software project depends on its 
performance requirement and naturally, it does not related directly to the size of a 
software organization. It is noted that software-integrity levels relate to the project 
criticality instead of software organization. In this paper, we have observed that most 
local middle-scale software organizations are developing non-critical business projects. 
That is why we are concerned with the software projects of the integrity-level 2, 
although some higher integrity-level software of specific domain are actually kept in a 
small size to reduce V&V cost and carried by a relatively small team. 

Furthermore, in considering with some software-related organizations in Taiwan, 
which have been ISO 9001:2000 registered, the suggested roadmap may have two 
different options basing on their current situations as illustrated in Fig. 3. 




Fig. 3. The proposed roadmap for minimum V&V tasks at the integrity level two. 

As shown in Fig. 3, a software organization that has been ISO-9001 registered 
may directly perform the primary process areas (i.e., RD, REQM, TS, VER and VAL) 
to benefit her achievement from ISO efforts. Taking REQM as an example, she needs to 
enhance the concept-documentation evaluation task by improving her original 7.1 
clause (Planning of product realization) to the capability level 4, as illustrated on the 
first row of Table 3. In order to assure to carry effectively out such improvement, she 
must perform the process elements as specified on Table 4. Alternatively, if a software 
organization has not ever practices of ISO-9001 but desires to implement her process 
improvement through the continuous CMMI model, she has to start with those 
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auxiliary process areas as suggested in Fig. 3. Accordingly, the proposed roadmap 
provides an obvious shortcut to enhance process improvement for the ISO-9001 
registered software organizations. 

Dug into a software life cycle, the proposed framework identifies the most 
important tasks for performing the minimum V&V tasks in order to ensure the software 
is developed in accordance with functional specifications and customer expected 
performance. Emphasis on the V&V tasks lies from the fact that identification and 
correction of errors early in the development cycle are less costly than that in later 
phases and thus the quality of software are significantly improved. As a result, a middle 
scale organization will benefit greatly from its software process improvement by the 
proposed budget-acceptable, feasible and effective approach. Based on the analysis of 
seven field observations, the experimented organizations shows the productivity 
measure increases from 120 LOCs (lines of code) a person per week to 250 LOCs in 
average. The defect injection rate generally decreases to one third due to employing 
early the verification techniques such as: walkthrough and inspection. Furthermore, the 
defect removal rate for most projects improved 37%, in comparison with the historical 
data. 

6 Summary and Conclusion 

With deep investigation on related literature, in particular, IEEE Std 1012 for software 
integrity levels and minimum V&V tasks in software life cycle for the non-critical 
business applications, best practices to achieve software process improvement are 
summarized in this paper. For saving software development cost, a practical roadmap to 
reach the second integrity level is proposed. Meanwhile, we set the goal of software 
improvement from none to the third capability level in the continuous representation 
CMMI model, with the desire to making the proposed roadmap is easily implemented 
for an organization that starts to employ philosophy of the CMMI model. 

The proposed framework establishes the bottom line to be performed for software 
process improvement in a software organization. Within a software project life cycle, 
the effort on verification and validation is highly emphasized to ensure that both quality 
control and quality assurance are implemented as scheduled plans. Based on the 
statistics of our field experiment cases, the participated organizations show somewhat 
improvements over their several measurements such as productivity, defect injection 
rate and defect removal rate. The benefit of our proposed roadmap provides an effective, 
efficiency and economical approach no matter the middle-scale companies is ISO 
9001:2000 registered or not yet. 

In our future research, we will extend the current results to include the project goal 
of reaching the capability level 4 under the integrity requirement at level 3. 
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Table 3. Comparison of CMMI process areas and ISO sections 5-8. 



CMMI process 

areas 

Requirement 

Management 



Project Planning 



Project 

Moinitoring and 

Control 

Supplier 

Agreement 

Management 



Measurement and 
Analysis 



Process and 
Product Quality 
Assurance 



Configuration 

Management 



Requirement 

Development 

Technical Solution 

Product Integration 



Verification 



Validation 

Organizational 
Process Focus 

Organizational 
Process Definition 

Organization 
Training 

Integrated Project 
Management 

Integrated Supplier 
Management 

Risk Management 



Acronyms 



Capability level mapping 
0 1 2 3 4 5 



REQM 



PP 




PMC 



I 



I 



I 



SAM 



I 



I 



MA 




PPQA 




I 



CM 




I 



RD 



TS 

PI 



VER 



VAL 



OPF 



I 



I 



I 



I 



I 11^ 111 



I 




ISO 9001:2000 clauses 

4. 1 General requirements 

4.2 Documentation requirements 

4. 1 General requirements 

5.1 Management commitment 

5.4 Planning 

7. 1 Planning of relization processes 

4. 1 General requirements 

5.1 Management commitment 

8.2 Measurement and monitoring 

4. 1 General requirements 

7.4 Purchasing 



7.5 Production and service 
operations 

8.2 Measurement and monitoring 

8.4 Analysis of data 

4. 1 General requirements 

5.1 Management commitment 

5.2 Customer focus 

5.3 Quality policy 

4.2 Documentation requirements 

7.3 Design and/or development 

7.5 Produuction and service 

operations 

5.2 Customer focus 

7.2 Customer-related processes 

7.3 Design and/or development 
7.3 Design and/or development 



7.3 Design and/or development 



7. 1 Planning of relization processes 
7.3 Design and/or development 

7.5 Production and service 
operations 

7. 1 Planning of relization processes 
7.3 Design and/or developmen 

5.5 Administration 




RSKM 



5.3 Quality policy 

5.4 Planning 

5.5 Administration 
6.2 Human resources 



5.4 Planning 

5.5 Administration 

7. 1 Planning of relization processes 

6.1 Provision of resources 
7.4 Purchasing 

5.1 Management commitment 
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Decision Analysis 
and Resolution 


DAR 














8.2 Measurement and monitoring 
8.4 Analysis of d 


Organizational 
Environment for 
Integration 


OEI 






1 








6.3 Facilities 

6.4 Work environment 


Integrated 

Teaming 


IT 














6.2 Human resources 


Organization 

Process 

Performance 


OPP 














5.6 management 

8.2 Measurement and monitoring 

8.4 Analysis of data 

8.5 Improvement 


Quantitative 

Project 

Management 


QPM 














5.6 Management review 

8.1 Planning 

8.2 Measurement and monitoring 
8.4 Analysis of data 


Casual Analysis 
and Resolution 


CAR 














8.1 Planning 
8.5 Improvement 


Organization 
Innovation and 
Deployment 


OID 














8.1 Planning 
8.5 Improvement 



Table 4. Process elements for various V&V activities. 



SLC Phase 


V&V 

Tasks 


Task Purposes 


Required Inputs 


Task Reports 




Concept 

Document 

ation 

Evaluation 


1 .Verily the allocation of 
system requirements. 
2.Validate the selected 
solution, and ensure that no 
false assumptions have been 
incorporated in the solution. 


Concept 
Documentation, 
Supplier Development 
Plans and Schedules, 
User Needs, 
Acquisition Needs 


Concept 

Documentati 

on Evaluation 

Reports, 

Anomaly 

Reports 


Concept 


Criticality 

Analysis 


1. Determine whether software 
integrity 

levels are established, 

Verify that the assigned software 
integrity levels 
are correct. 

2. Document the software 
integrity level assigned to 
individual software 
components. 


Concept Documentation, 
Developer integrity level 
assignments 


Software 

Integrity Level 

Reports, 

Criticality 

Analysis, 

Anomaly 

Reports 


Requirements 


Acceptance 
V&V Test 
Plan 

Generation 

and 

Verification 


1. Ensure the correctness, 
completeness, accuracy, 
testability, and consistency of 
the requirements. 


Concept Documentation, 
SRS, 

IRS, 

User documentation, 
Acceptance Test Plan 


Acceptance 
V&V Test 
Plan, 
Anomaly 
Reports 




Criticality 

Analysis 


1 . Review and update the 
existing criticality analysis 
results from the prior Criticality 
Task Report using the SRS and 
IRS. 


Criticality 

SRS 

IRS 


Criticality 

Analysis, 

Anomaly 

Reports 
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Design 


Component 
V&V Test 
plan 

Generation 

and 

Verification 


1. Demonstrate that the design is 
a correct, accurate, and complete 
transformation of the software 
requirements and that no 
unintended features are 
introduced. 


SRS 

SDD 

IRS 

IDD 

Component Test Plan 


Component 
V&V Test 
Plan, 
Anomaly 
Reports 


Criticality 

Analysis 


1 . Review and update the 
existing criticality analysis 
results from the prior Criticality 
Task Report using the SDD and 
IDD. 


Criticality 

SDD 

IDD 


Criticality 

Analysis, 

Anomaly 

Reports 


Implementation 


Component 
V&V Test 
Execution 
and 

Verification 


1. Verify and validate that these 
transformations are correct, 
accurate, and complete. 


Source Code, 

Executable Code, 

SDD, 

IDD, 

Component Test Plans, 
Component Test 
Procedures, 

Component Test Results 


Test Results, 

Anomaly 

Reports 


Criticality 

Analysis 


1 . Review and update the 
existing criticality analysis 
results from the prior Criticality 
Task Report using the source 
code. 


Criticality Source Code 


Criticality 

Analysis, 

Anomaly 

Reports 


Test 


Acceptance 
V&V Test 
Procedure 
Generation 
and 

Verification 


1. Verify that the developer’s 
Acceptance Test Procedure 
conform to project defined test 
document purpose, format, and 
content. 


SDD, 

IDD, 

Source Code, 

User Documentation, 
Acceptance Test Plan, 
Acceptance Test 
Procedures 


Acceptance 
V&V Test 
Procedures, 
Anomaly 
Reports 


Acceptance 
V&V Test 
Execution 
and 

Verification 


1. Use the developer’s 
acceptance test results to verify 
that the software satisfies the test 
acceptance criteria. 


Source Code, 
Executable Code, 
Integration Test Plan, 
Integration Test 
Procedures, 

Integration Test Results 


Use the 
developer’s 
acceptance, 
test results to 
verily that the 
software 
satisfies the test 
acceptance 
criteria. 
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Abstract. With the integration of computer technology, consumer 
products, and communication facilities, the software in an embedded 
system now accounts for as much as 70% of total system functionalities. 
In this paper, we propose a code generation methodology called RCGES 
(Retargetable Code Generation for Embedded Systems) for the automatic 
code generation on retargetable embedded systems and two issues are 
solved. Firstly, an embedded C code for embedded processor is gener- 
ated automatically from ANSI (Ameriean National Standards Institute) 
C code based specification using our proposed code generation algorithm. 
Secondly, we develop a graphical user interface to configure the param- 
eter for retarget processor of embedded system. Two embedded system 
examples, 8051-based and PIC (Peripheral Interface Controller)-hased, 
are used to illustrate the feasibility of the RCGES methodology. 



1 Introduction 

As embedded system requirements become increasingly demanding, there are 
five issues that need attention: analysis, software development, verification and 
validation, test and debug, and time to market. The main issue for developing 
embedded software is that the software needs to be rewritten when the processor 
is changed in an embedded system. In other words, when embedded software in 
C code is downloaded to an embedded processor such as 8051 or PIC (Parallel 
Interface Controller), designer have to carefully repeat all the design tediously. 
Thus, designers need to spend more efforts to modify the program code in em- 
bedded systems. In this work, we propose a code generation methodology to 
solve software development issues for retargetable embedded systems. 

There are several methods that can be used to generate the retargetable 
code. Rettberg [1] proposed a useful flowchart based on the state flow models 
to generate code for multiple inputs into a single output. By using a simple tree 
pattern matching algorithm for code generation, Chen [2] reduced the matching 
time by 69%. In 2002, Leupers [3] mentioned four major problems for embedded 

* This work was supported in part by a project grant NSC93-2215-E-027-006 from the 
National Science Council, Taiwan. 
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processors: a huge programming effort and, compared with C or CH — h, far less 
code portability, maintainability, and dependability. For these to be solved, he 
used retargetable C and C++ compilers for achieving the goal of retargetability. 

By exploring RCGES {Retargetable Code Generation for Embedded Systems), 
we propose to provide the involved fields with valuable benefits. The expectable 
benefits of this plan are summarized. Firstly, we develop a friendly graphical user 
interface to set the parameter for retarget processor of embedded system. Next, 
we developed an algorithm of code generation for software synthesis. Finally, an 
automatic software synthesis tool is used to solve retargetable software devel- 
opment issue of embedded systems. The proposed RCGES methodology will be 
illustrated using two examples: LED control of advertisement and four phases 
stepping motor control (FPSMC). Details are given in Section 4. 

This paper is organized as follows. Section 2 gives some previous work. Sec- 
tion 3 describes RCGES framework and design flow. A code generation algo- 
rithm is introduced to solve the stated problems in Section 4. In Section 5, two 
embedded system examples are used to illustrate the feasibility of the RCGES 
methodology. Section 6 concludes this paper and gives some future work. 



2 Previous Work 

Due to the retargetable code generation for embedded systems is a significant 
issue, several techniques [l]-[8] recently were proposed. Leupers [4] provided a 
survey of methods and techniques dedicated efficient code generation for embed- 
ded processors. Lee [5] had introduced the Interrupt Time Petri Nets (ITPN) 
model and the Interrupt- Based Quasi-Dynamic Scheduling (IQDS) algorithm to 
model embedded systems with interrupt property and find task schedules with 
time constraints. However, this work [5] lacks of flexibility, efficiency and produc- 
tivity, i.e. the others embedded systems may fail except 8051 micro-controller. 

Hsiung [6] proposed a TEQSS {Time- Extended Quasi-Static Scheduling al- 
gotithm) to synthesize real-time embedded software code through a set of Time 
Complex-Choice Petri Nets. TEQSS mainly focus to meet on memory and time 
constraints that it did not address retargetable code generation for embedded 
systems. In 2004, Lee and Hsiung [8] proposed an Embedded Software Synthesis 
and Prototyping (ESSP) methodology to solve the software synthesis, software 
verification, code generation and system emulation. 

Manohar and Bhatia [9] proposed a tool for automated code generation for 
designing user interfaces on character terminals that designed a user interface 
(UI) tool for designer to solve problem on complex library call. This tool lacked 
of facilitation because it did not support mouse events. 

Charot and Messe [10] proposed a flexible code generation framework for 
the design of application specific programmable processors that it used library 
modules to achieve flexible compilation passes such as code generation, schedul- 
ing, etc. The framework consisted of two levels, one was retargeting modules 
that defined compilation flow, and the other allowed user to selected and linked 
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modules from the library for building a compiler. Unfortunately, the library was 
incomplete until the paper had been published. 



3 RCGES Design Methodology 

The raise applications of embedded systems lead code generation becoming more 
important. In order to cope with the variety of application in embedded systems, 
retargetable code generation on embedded systems are seen the most significant 
issue that most researchers are eager to solve. In this section, we propose a code 
generation methodology called RCGES for developing retargetable embedded 
software that successfully execute on 8051-based and PIC-based embedded pro- 
cessors. In this section, the design methodology of RCGES is introduced which 
include structure, design flow, algorithm and GUI {Graphical User Interface) of 
RCGES. 



3.1 Framework and Design Flow of RCGES 

The framework of RCGES is divided into three parts that are the front-end, main 
program and subroutine. In front-end part, several settings are made including 
system target input/output port and system initialization. Second part is namely 
main program which is core of the RCGES. The purpose of main program is to 
translate ANSI C code into the retarget embedded C code and set interrupt, 
timer function and input/output port into generation code. Finally part is a 
subroutine which is addition of this process all of function except main function 
in ANSI C code to add interrupt vector and timer function to code generation. 

The RCGES design flow is shown in Fig. 1. There are six design steps in 
RCGES which are listed as following: 

Step (1) Initialization: choosing the type of processor in target embedded system, 
initialization and setup input/output port for embedded processor. 

Step (2) Parsing: parsing syntax, variables, keywords, operator and operand. 
Step (3) Set Parameter: design a user friendly interface for setting interrupt 
vector, timer and I/O interface for retargetable embedded processor. 

Step (4) Transfer the source C code into the embedded C code of target processor 
in embedded system using the code generation of RCGES algorithm. 

Step (5) Test: the output code from code generation algorithm of RCGES be 
tested by Keil [11] C Compiler and Hi-Tech compiler [12] depending on 8051- 
based and PIC-based embedded systems, respectively. 

Step (6) Verification: we use two kinds of emulation platform, 8051 and PIC16F 
of WINICE [13] series, to verify the two output codes, respectively. 

3.2 Code Generation Algorithm 

In the process of developing the software, the program code is automatically 
generated based on the different target embedded systems. Therefore, the system 
designers will only be required to import an ANSI C code and the retarget 
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Fig. 1. Design Flow of RCGES 



embedded C code will be generated by our proposed code generation algorithm. 
The difference between ANSI C code and embedded C code is that the former 
needs to add some parameter such as input/output port, timer, interrupt. 

The code generation algorithm of RCGES is shown in Table 1. The purpose 
of code generation algorithm is automatically generation embedded C code from 
ANSI C code. In Table I, step (1) opens the source file and assigns a name for 
output file. Step (2) is to set the initial condition of embedded system such as 
input/output initial value, interrupt and timer enable/disable, etc. Step (3) is 
the parameter setting of interrupt, time, and input/output depending design re- 
quirement. The core of parsing program is performed in step (4) to step (6). The 
algorithm has to check what kind of the token such as keyword, comment, head 
file of the C code, etc., or a variable. In step (7), we transfer token into equation 
or flow control equation. For example a = b -I- c, where a, b, c, =, and -I- are also 
a token. Step (8) is mapping parameters, such as interrupt, timer, variable type, 
input/output and subroutine of interrupt or timer, into a program code. Step (9) 
is a function of translation. During the period of translation of ANSI C code into 
embedded C code, a RTL {Register Transfer Level) tree model, such as Binary 
Expression Tree, is built for RCGES which handles some statements between 
variables of ANSI C code and input/output port of embedded C code, such as 
variables mapping, selection, and loop functions. The RTL program is a series 
of expression trees which are transformed into postfix order for the bottom-up 
comparison [2]. 
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Table 1. Code Generation Algorithm 



Procedure Code Generation 




Begin 




Open file ( ) ; 


(1) 


Initialization ( ) ; 


(2) 


Parameter Setting ( );//Set the interrupt, vector. 


timer. 


I/O port type, etc. 


(3) 


While ( file is not ending) 


(4) 


GetToken ( ) ; //get the variable from the file 


(5) 


if (token = = keyword) //include, int, char, while for (6) 


{if(Token= = Selection) //if, case, else.. .etc. 




Selection ( ); //parsing if, else, and get the 


condition 


else if(Token= = Loop) //while, for ... 




Loop ( ) ; //get the loop times, and ending condition 


else if ( Token= = HeadFille) //include 


Headfile ( ) ; //get the head file 




else if (Token = = Declare) 




Declare ( ); //store the variable. 


array, and 


subroutine. 




else Comment ( ); 




}// end of if 




else Variable ( ); 




End; //end of While 


(7) 


Statement ( ) ; 


(8) 


Translation ( ); 


(9) 


End; //end of Begin 




End; //end of Procedure 





3.3 Graphical User Interface of RCGES 

In RCGES methodology, a graphical user interface (GUI) is supported for the 
selection of an embedded processor and the setting of parameters. The GUI 
of 8051-based and PIC-based embedded systems is shown in Fig. 2 and Fig. 3, 
respectively. The main operation has three steps which are described as following: 
Step (1) is to open a file and select the target processor: designers should open 
a source code from the menu ’File’ and select target processor. Step (2) is to set 
the initial condition of embedded processor including the parameter of interrupt, 
timer, input/output port. Interrupt mode includes intO, inti and trigger by low- 
level or negative-edge. In timerO and timerl, designers can also set timer priority 
and the amount of counter. Step (3) is to generate and display retargetable 
embedded software. 

4 Embedded System Examples 

In this section, we use two embedded system examples to illustrate our proposed 
RCGES methodology: LED control of advertisement and four phases stepping 
motor control (FPSMC). 

The experiment environment of embedded system include three parts. First 
part, an GUI is used for the parameter setting of 8051-based and PIC-based 
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Fig. 2. GUI for 8051-based Embedded Systems 




Fig. 3. GUI for PIG-based Embedded Systems 



embedded systems which is shown in Fig. 2 and Fig. 3, respectively. Second 
part, two compilers, namely Keil C compiler [11] and Hi-Tech C [12], are used 
for producing executable specification code for embedded systems. Final part, a 
WINICE [13] emulation board with 8051 and PIC embedded processors is used 
to verify for the function of embedded software. The WINICE emulation board 
specification includes a 80(C)51/52 CPU (Central Processing Unit), 16 MHz 
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working frequency, 64 Kbyte program memory, 64 Kbyte data memory and par- 
allel transmission port interface. The specification of PIC emulation board is 
16F877 CPU, 20 MHz working frequency and 8 Kword memory. In this ex- 
periment, timer, interrupt and input/output port functionality are tested when 
system is in initialization state. When all of functionality is work correctly, two 
embedded system examples are illustrated by RCGES. 

The first example is a LED display control of advertisement that system 
function is 8 LED turning on sequentially from left to right or from right to 
left controlled by interrupt or timer in embedded system. Therefore, system 
designer firstly edit an ANSI C code as main program which is shown in Table 
2. Then designer use our proposed graphical user interface to set parameter 
such as initialization, interrupt, timer mode or input/output value for system 
constraints. Applying the RCGES to generate code for 8051-based embedded 
system is shown in Fig. 2. The same method is applied to PIC-based embedded 
system then the generation code is shown in Fig. 3. Main program define a set of 
table for either port (PI) based on 8051 or port (PORTD) based on PIC output 
which wait for interrupt or timer mode. If interrupt is ’O’, then LED will turn 
on from left to right sequentially. In contrast, if Timer mode set to ’O’, LED 
will turn on from right to left. The embedded C code of the first example for 
8051-based and PIC-based system is shown in Table 3 and Table 4, respectively. 
Furthermore, initial values, interrupts and timer mode are also generated by 
RCGES. 



Table 2. ANSI G code for LED Display Gontrol of Advertisement 



# include < stdio.h > 

# include < stdlib.h > 
void delay (int) ; 
main ( ) 

{ int i , temp ; 
char 

table [ 8 ] ={ 0xe7 , 0xc3, 0x81, 0x00, 0x81, 0xc3, Oxe 
7,0xff} ; 
while (1) 

{ for (i=0; i<=7 ; i++ ) 

( temp=table [1] ; 

delay (10000) ; 



void delay (int count) 

{int i,j; 

for (i=0; Kcount; i++ ) 
for ( j=0 ; j<1500 ; j ++ ) ; 



) 
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Table 3. Embedded C Code for 8051-based Embedded Systems 



#include <reg51 


• h> 








void delay (int) ; 








int k=2000; 
char OUT=0x80; 
Pl=11111000; 
main ( ) 

f 










1 

ET1=1 ; PT1=1 ;TMOD=0x02 ;TH0= ( 


256-250) 


1 ;TL0= 


(256-250) ; 
TR1=1; EX0=1; 


PX0=1; 


IE0=1; 


int 1 , 


temp ; 


char table [ 8 


]={0xe7 


, 0xc3 


, 0x81 , 


0x00 , 


0x81 , 0xc3 , 
while (1) 


0xe7 , 


Oxff} 


' 




{for (i=0 ; 


i<=7 ; 


i++) 






(PI = table 


[i] ; 








delay (10000) ;}} 

1 









Another example is a four phase stepping motor control (FPSMC) . The func- 
tion of FPSMC is using interrupt and Timer to control the direction of motor. 
If Timer mode set to ’O’, the motor turns clockwise. Oppositely, if interrupt is 
’O’, the motor turns counterclockwise. The source ANSI C code in this example 
is shown in Table 5. Using the same methodology, various parameters are set 
through Fig. 2 and Fig. 3. Finally, the retarget embedded C code is automati- 
cally generated by RCGES methodology which is shown in Table 6 and Table 7 
for 8051-based and PIC-based embedded system, respectively. In order to verify 
retarget code of RCGES, we use a SW compiler such as Keil C compiler for 
8051 embedded processor and Hi-Tech C compiler for PIC embedded processor 
to produce retarget code. WINICE emulation board of 8051 and PIC16/17 are 
also used to verify RCGES feasibility. 

The experiment result has shown that RCGES can generate code through 
SW compiler verification or emulation board such as 8051 and PIC of WINICE. 
In case of external circuit example, we not only download the generation code of 
RCGES into retargetable embedded systems, but also verify circuit functionality. 

5 Conclusion and Future Work 

A code generation methodology called RCGES {Retargetable Code Generation 
for Embedded Systems) was proposed to solve automatically embedded C code 
generation from a given ANSI C code and provide a GUI for parameter set- 
ting of embedded processors. Moreover, we have shown the feasibility of RCGES 
through two experiments on WINICE emulation board. In the future, we will 
focus on the automatic code generation of ARM-based and DSP embedded sys- 
tems. 
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Table 4. Embedded C Code for PIC-based Embedded Systems 



#include <pic.h> 

#include "cnfig877a . h" 
void delay (int); 
void main (void) 

{ 

TRISD=0x00; 

PORTD=0b00000001; 

T2CON=0b01111110; 

TMR2IE=1; 

PEIE=1; 

GIE=1; 

PR2=155; 

char 

table [ 8 ] ={ 0xe7 ,0xc3, 0x81, 0x00, 0x81, Ox 
c3, 0xe7, Oxff } ; 
while (1) 

{ for (1=0; i<=7 ; i++ ) 

{PORTD=table[i] ; 
delay (10000) ;}} 

} 



Table 6. ANSI C Code for FPSMC 



# include < stdio.h > 

# include < stdlib.h > 
void delay (int) ; 
main ( ) { 

int i, j , temp; 
char 

table [8] ={ 0x01, 0x02, 0x04, 0x08, 0x10,0 
x20, 0x40, 0x80} ; 
for (j=0; j<10; j++) 

{ for (i=0 ; i<=7 ; i++) 

{ temp=table [i] ; 
delay (10000);} 
for(i=7; i>=0;i++) 

{ temp=table [i] ; 
delay (10000) ; } 
j=0;} 

} 
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Table 6. Partial Embedded C Code of FPSMC for 8051-based Embedded Systems 



void T0_int(void) interrupt 1 
{ TH0=(65536-5000) /256; 

TL0=(65536-5000) %256; 
if ( — k=0) { 
k=100; Pl=step; 
if (step==0xl0) { 
step=0x01 ; 
k=100; 

while ( — k=0) 

{ Pl=step ; 

Step»=l ; 

if (step==0xl0) 
step=0x01 ; } } } 
void delay (int count) 

{ int i , j ; 

for (i=0 ; i<count ; i++) 
for (j=0 ; j<1500 ; j++) { ;}} 



Table 7. Partial Embedded C Code of FPSMC for PIC-based Embedded Systems 



void delay (int count) { 
int i,j; 

for (i=0 ; i<count; i++ ) 
for ( j=0 ; j<1500; j ++ ) 

{ ; }} 

void interrupt isr_Sevr (void) { 
TMR2IF=0; 

if(-k=0)[ 

k=100; 

PORTD=step; 
if (step==0xl0) [ 
step=0x01; 
k=100; 

while ( — k=0) { 

PORTD=step; 

Step» =1; 
if (step==0xl0) [ 

step=0x01; } } } } } 
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Abstract. Embedded systems are composed of a heterogeneous collec- 
tion of digital, analog, and mixed-signal hardware components. This pa- 
per presents a method for the verification of systems composed of such a 
variety of components. This method utilizes a new model, timed hybrid 
Petri nets (THEN), to model these circuits. In particular, this paper de- 
scribes an efficient, approximate algorithm to find the reachable states 
of a THEN model. Using this state space, desired properties specified in 
ACTL are verified. To demonstrate these methodologies, a few hybrid 
automata benchmarks, a tunnel diode oscillator, and a phase-locked loop 
are modeled and analyzed using THENs. 



1 Introduction 

Embedded systems are pervasive in today’s society and are often used in safety 
critical situations. Therefore, verification of embedded systems is of extreme im- 
portance. The verification of embedded systems is complicated by the fact that 
these systems are typically a hybrid of both digital hardware and physical sys- 
tems (such as sensors, actuators, and plants) which are analog or mixed-signal 
by nature. While there has been substantial success in recent years in the formal 
verification of the digital components, there has been relatively little research in 
the formal verification of analog and mixed-signal components. In analog and 
mixed-signal circuits, the state space often includes several continuous variables, 
such as voltages and currents, which must be tracked. These continuous vari- 
ables make the state space extremely large and even more difficult to analyze 
than purely discrete systems. Therefore, the major goal of this work is to de- 
velop efficient, approximate modeling and analysis techniques that are capable 
of preserving the behavior necessary to verify correctness. 

Recently, there have been attempts to develop methods to formally verify 
analog circuits [1,2, 3, 4, 5, 6, 7, 8]. Much of this work comes from Hartong, Hedrich, 
and Barke. In their work, they divide the continuous state space into regions 
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which are represented in a Boolean manner. From this decomposition, they create 
a transition relation by selecting test points in each region to determine reachable 
next states. This Boolean abstraction allows them to perform model checking 
using standard Boolean based approaches. While a promising approach, this 
technique loses significant accuracy in the abstraction to a Boolean model. 

Our previous work on verifying digital circuits utilizes time/timed Petri net 
models to represent the circuit’s behavior [9,10]. In order to model analog and 
mixed-signal circuits, this paper introduces the timed hybrid Petri net (THPN) 
which supports continuous variables to represent currents and voltages. Alterna- 
tively, hybrid automata models [11,12,13] could be used, and the analysis meth- 
ods presented in this paper could certainly be adapted to a class of hybrid 
automata models, if desired. 

Analog circuits are traditionally represented using differential equations. This 
paper, therefore, presents a method to translate differential equation models into 
THPNs with the goal of minimizing the size of the models. While efficient sim- 
ulation methods exist for systems of differential equations, these methods only 
produce results from a single initial condition and assume deterministic behav- 
ior. THPNs are capable of modeling systems over a range of initial conditions 
and over all possible non-deterministically chosen runs of the system. 

To verify properties about the given systems, reachability analysis must be 
performed on the THPN. Many different methods for reachability of hybrid mod- 
els have been proposed [11,12,13,14,15,16,17,18,19]. Our method adapts a zone 
based algorithm [10] to perform the reachability analysis of the THPN model. 
Although reachability on THPNs is certainly undecidable [12,20] and to achieve 
efficiency our analysis method is conservative, our preliminary studies indicate 
that our method can be accurate enough in practice. This is demonstrated by 
the analysis of a few hybrid automata benchmarks, a tunnel diode oscillator, 
and a model of a phase-locked loop (PLL). 

2 Timed Hybrid Petri Nets 

Recently, various extensions to Petri nets have been proposed to develop hybrid 
Petri nets [15,21,22,23]. The THPN model that we propose to use for verify- 
ing analog and mixed-signal circuits is most similar to the Hybrid Net Condi- 
tion/Event System Model proposed by Chen et. al [15] though with both limi- 
tations and extensions. This section introduces a discrete Petri net, followed by 
a continuous Petri net, and finally how they can be composed to form THPNs. 

A discrete Petri net is a tuple A^d = {Pd, 7d, Fd, toq) such that: 

Pd : is a finite set of discrete places; 

Td : is a finite set of discrete transitions; 

Pd C (Pd X Pd) U (Td X Pd) is the flow relation; 

'mo C Pd is the set of initially marked places. 

Graphically, discrete places are depicted as circles, discrete transitions as solid 
boxes, and markings as solid circles (see place pil and transition Vin+ in the 
THPN for the PLL shown in Fig. 1 where pil is initially marked). 
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Fig. 1. THPN model for a PLL (unlabeled edges have [0,0] delay). 



A continuous Petri net is a tuple Nc = {Pc, Pc, ^c, R, Aq) such that: 

Pc : is a finite set of continuous places; 

Tc : is a finite set of continuous transitions; 

Pc C (Pc X Pc) U (Tc X Pc) is the flow relation; 

P : Pc — >■ Q is the flow rate of transitions; 

Ao : Pc — f ({— oo} U Q) X (Q U {oo}) assigns a range of initial marking values 
to the continuous variables (xoi(p), Xou(p))- 

Note that Q denotes rational numbers. Also, Aq is a range allowing uncer- 
tainty in the initial value of the continuous variables. This is one source of non- 
determinism in the THPN model. Graphically, continuous places are depicted 
as double circles, continuous transitions are depicted as empty boxes, and each 
continuous transition is annotated with its flow rate (see continuous place vl 
and continuous transition dl with rate 1 in Fig. 1). 

Using the above definitions, a timed hybrid Petri net (THPN) can now be 
defined as a tuple Ah = (Ad, Ac, Ph, P, P, A) such that: 

Ad : is a discrete Petri net as defined previously; 

Ac : is a continuous Petri net as defined previously; 

Ph C (Pc X Pd) U (Pd X Pc) U (Pd X Pc) is the flow relation between the discrete 
Petri net and the continuous Petri net; 

B : (PcXPd) — >■ ({— oo}UQ) X (QU{oo}) assigns a predicate [bi{p,t),bu{p,t)] to 
arcs from continuous places to discrete transitions where b\{p,t) < bn{p,t); 

D : ((Pd U Pc) X Pd) — >■ Q~^ X (Q+ U {oo}) assigns a delay [di{p,t),du{p,t)] to 
each arc between a place and a discrete transition where d\{p,t) < du{p,t). 

A : (Pd X Pc) — >■ ({— oo} U Q) X (Q U {oo}) specifies an assignment value 
[a\{t,p), au(t,p)] on arcs from discrete transitions to continuous places. 
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Note that Q~^ are the non-negative rationals. Also, B, D, and A are only defined 
for arcs specified in the corresponding flow relation. Arcs added to the THPN 
by the flow relation are denoted with dashed lines. An arc from a continuous 
place to a discrete transition is annotated with a predicate and a bounded delay 
assignment. Arcs from a discrete place to a discrete transition are also anno- 
tated with a bounded delay assignment. An arc from a discrete transition to a 
continuous place is annotated with a variable assignment. Arcs from continuous 
transitions to discrete places are not allowed since they do not appear useful. 

The state of a THPN is defined using a 3-tuple of the form (m, x, c) where: 

m C P(j is the set of currently marked discrete places; 

X : Pc ^ {QU {— oo, oo}) is the current marking of the continuous places; 

c : ((Pc U Pd) X Td) — >■ (Q'*' U {oo}) is the current clock value of each arc. 

A clock is associated with each arc to a discrete transition to denote how long the 
enabling condition on that arc has been satisfied (i.e., a discrete place associated 
with the clock becomes marked or the predicate for a continuous place associated 
with the clock becomes true). This clock is initialized to zero when its enabling 
condition becomes true, and all active clocks are advanced in lockstep. 

A discrete transition, t € Ta, is discretely enabled when all discrete places 
in its preset are marked (i.e., y{p,t) & Fd ■ P & xn). A discrete transition t is 
enabled when it is discretely enabled and all predicates in its preset are true (i.e., 
y{p,t) G Fh ■ {bi{p,t) < a;(p) < bu{p,t)))- A discrete transition t is fireable when 
it is enabled and all clocks in its preset have reached or exceeded their lower 
bound (i.e., V(p, t) G Pd U Ph . c{p,t) > di{p,t))- A discrete transition t must 
fire when it is fireable and all clocks in its preset have reached or exceeded their 
upper bound (i.e., V(p, t) G Pd U Ph . c{p,t) > dn{p,t))- 

A continuous transition, f G Tc, is enabled when all discrete places in its 
preset are marked. When enabled, continuous transitions fire continuously at 
their specified rate. The velocity of a continuous transition is defined as follows: 

Af = l G Ph . p G TO 

’ } 0 otherwise 

To calculate the instantaneous rate of change of a continuous place for a given 
marking, x{p,m), the velocities of the outgoing transitions are subtracted from 
the sum of the velocities of the incoming transitions. This is defined as follows: 

i(p, to)= v{t,m) — v{t,m) 

(t,p)€Fc (p,t)&Fc 

The state of a THPN can change by the firing of a discrete transition or 
the advancement of some amount of time t which is less than Tmax (defined 
below). If f is a fireable discrete transition in the state (m,x,c), the firing of 
t updates the discrete marking by removing tokens from the discrete places in 
the preset of t and placing tokens into the discrete places in the postset of t 
(i.e., m = m — {p'\{p' ,t) G ^dj + {p'\{tiP') G ^d})- Clocks in the postset of t 
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are also initialized to 0. Finally, all continuous places, p, in the postset of t are 
assigned a new range of values given by A(t,p). If r time units can advance in 
the state {m,x,c) (i.e., r < Tmax), the advancement of r time units updates 
the continuous value of tokens in all continuous places by the amount consistent 
with the current rate of change (i.e., t • x{p, m)). Since the advancement of time 
may have caused some predicates to become true, the corresponding clocks need 
to be updated to reflect the amount of time in which these conditions have been 
true. Finally, all other clocks are incremented by the time r. 

The maximum time advancement, Tmaxi is limited to the amount of time that 
can advance before a discrete transition must fire. A fireable discrete transition 
must fire when all of the clocks in its preset have reached their upper bounds. For 
a discretely enabled transition, t G T^, there are five possible cases in which this 
can occur. The first is that a clock c{p, t), where p is a discrete place, is the last 
one to expire (i.e., reach its time limit). This occurs after du{p,t) — c{p,t) time 
units have elapsed. The next four cases have to do with clocks associated with 
continuous places in the preset of t. When its predicate is satisfied (i.e., bi{p, t) < 
x{p) < bu{p,t)) such a clock is active, and this is again simply du{p,t) — c{p,t) 
time units. When such a clock is not active, we must first calculate how long 
before it becomes active and add to this the upper bound of the timing delay 
for this clock. When a continuous place is increasing in value (i.e., x{p,m) > 0) 
and it is below its lower bound (i.e., x{p) < b\{p,t)), the clock becomes active 
after {b\{p, t)—x{p))/x{p, m) time units and expires du(P 7 1 ) time units after that. 
A similar time limit can be derived when the place is decreasing in value (i.e., 
x{p,m) < 0) and it is above its upper bound (i.e., x{p) > bu{p,t)). The final 
case is when an increasing place is above its upper bound or decreasing place 
is below its lower bound. In this case, no amount of time can make this clock 
become active, so the time limit is infinite. 

Figure 2 shows a block diagram for a PLL. A PLL operates by accepting an 
input frequency (Vin) for comparison to the oscillator frequency (Vosc) in the 
phase detector. It generates the signal up when Vin is leading Vosc which results 
in the VCO control voltage ( Vctrl) being increased to increase the frequency of 
Vosc. Similarly, if Vin is lagging Vosc, Vctrl is decreased. This is a mixed-signal 
circuit because the signals Vin, Vosc, up, and down are digital while Vctrl is 
analog. The THPN model is composed of three parts. The left portion of Fig. 1 
generates a square input wave ( Vin) with period 1000 with some jitter. The 
center portion is the phase detector which compares the arrival time of Vin 
and Vosc to determine whether up or down should go high. The right portion 
models the VCO using continuous places to control the frequency of Vosc. When 
vl becomes empty, Vosc rises enabling d2 by marking the discrete place in its 
preset. When v2 is empty, Vosc falls enabling dl. The continuous transitions 
inc and dec adjust the contents of the continuous places by adding continuous 
tokens to v2 and removing them from vl, respectively. This effectively adjusts 
the frequency of Vosc over a continuous range. The initial frequency of Vosc 
is determined by the initial value of vl and v2. The PLL adjusts Vosc until it 
matches the frequency and phase of Vin. 
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Fig. 2. Block diagram for a PLL. 




Fig. 3. Representing differential equations using THPNs. 



3 Modeling Differential Equations 

Analog circuits are typically modeled using differential equations. This section 
describes how differential equations can be represented using THPNs. Variables 
in THPNs can only change at constant rates. Therefore, to analyze more compli- 
cated systems, the continuous operating ranges must be decomposed into regions 
in which the rate of change is assumed to be constant. Any discretization method 
with the goal of minimizing the resulting number of regions may be utilized; how- 
ever, to generate a THPN, regions must be rectangular in shape and have only a 
single neighboring region on each side in each dimension. We use a discretization 
approach similar to that proposed in [6,7]. 

After decomposing the continuous domain into regions, a THPN is generated 
as shown in Figure 3. This figure shows the region where 0 < a; < and 0 < y < 
y\. Each region is modeled by two continuous transitions which feed continuous 
places X and y and are enabled by a discrete place in each dimension. Within each 
region, x and y can increase or decrease by constant rates. Discrete transitions in 
each dimension allow transitions into new regions when x and y increase beyond 
or decrease below region boundaries. Each region has a corresponding THPN 
representation resulting in a net that approximates the surface. 
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4 Reachability Analysis 

The state space of THPNs is infinite in that each of the clocks and continuous 
places can take on any real value. Therefore, to perform reachability analysis 
on THPNs, it is necessary to represent the infinite state space using a method 
that groups the infinite number of states into equivalence classes. Many meth- 
ods developed for timed models utilize convex polygons, or zones, to represent 
ranges of clock values that have equivalent behavior [24,10,9]. These methods 
use a difference hound matrix (DBM) to represent these polygons efficiently. We 
propose to use DBMs to represent sets of states of THPNs by including both 
the clock values and the values of the continuous places in the DBM. Each set 
of states is represented with the tuple s = {m, dbm, val, Ci, Cf) where: 

TO C P(j is the set of marked discrete places; 

dbm: ((Pd xTd)U(Pc x Td)UPcU{po}) x ((^d x Td)U(Pc xTd)UPcU{po}) Q is 
a zone composed of active clocks, active continuous variables, and po which 
is always 0; 

val : Pc — >■ (Q U {— oo,oo}) x (Q U {— oo,oo}): the set of inactive continuous 
variables (i.e., x{p,m) = 0) and their corresponding values; 

Ci C (Pc X Td) is the set of clocks between continuous places and discrete 
transitions that have been introduced since the last change in direction; 

Cf C (Pd X Td) U (Pc X Td) is the set of clocks that have fired (i.e., clocks for 
which timing no longer needs to be tracked). 

In the presentation of the algorithms, we use the notation (p, t) G dbm to deter- 
mine if the function dbm is defined for the clock c{p,t) (i.e., c{p,t) is an active 
clock) . Similarly, p G dbm is used to determine if a continuous variable is defined 
in the dbm (i.e., active). In the initial state set, to = toq, dbm includes all active 
clocks c{p,t) (i.e., p G toq or p G Pc and b\{p,t) < xqi < bffpff)) and active 
continuous variables x{p) (i.e., x{p,m) 0), val includes all inactive variables, 
Ci = {{p,t)\{p,t) G Ph A (p, t) G dbm}, and Cf = 0. 

A new zone based algorithm for reachability analysis of THPNs is shown in 
Fig. 4. The algorithm is essentially a depth first search of the state space. It 
begins by constructing the initial state set and adding it to the set of reachable 
states, S. The remainder of this section describes this algorithm in more detail. 

4.1 Finding Possible Events 

The algorithm shown in Fig. 5 determines all possible events that can result 
in a new state set and sorts them into a set of sets where each individual set 
represents a group of events that must occur simultaneously. There are three 
types of possible events: clock firing, clock introduction, and clock deletion. A 
clock c{p,t) can fire if it is in the dbm and can reach its lower bound, d\{p,t). 
A clock c(p, t) can be introduced into the dbm if p is a continuous variable, 
and it can increase above its lower bound, b\{p,t), or decrease below its upper 
bound, bu{p,t). The set c, is checked to prevent clocks from being introduced 
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reach () 

s= initial_state_set 0 , S = {s}, done = false 
E— f ind_possible_events(s) 
while (-1 done) 
e =select (.E) 

if (_E — {e} ^ 0) then push(s,_E — {e}) 
s' = do_events(s,e) 
if (.s' ^ S) then 
S = SU{s'}, s = s' 

E= f ind_possible_events (s) 
else 

if (stack not empty) then (s,E) = popO 
else done = true 
return S 



Fig. 4. Reachability analysis algorithm for THPNs. 



find_possible_events((m, dbm, val, a, Cf)) 

F = 0 

foreach (p, t) £ dbm 

if (dbm[{p,t)][po] > di{p,t)) then 
E = add_set_item{E , {p, t), fire) 
foreach p £ dbm 
foreach (p, t) £ Fh 

if ((p, t) ^ dbm A (p, t) 0 Cf) then 

if (((i(p, m) > 0) A (d&m[p][po] > b\{p,t) / x{p,m))A 
{-dbm\po][p] < bi{p,t)/x{p,m)))V 
{{x{p,m) < 0) A (d&m[p][po] > bu{p,t)/x{p,m))A 
{-dbm[po][p] < bu{p,t)/x{p,m))) A ((p, t) ^ Ci)) then 
E — add-set-item{E , (p,t),intro-clk) 

else 

if (((i(p, m) > 0) A (d&m[p][po] > bu{p,t)/x{p,m)) A (&u(p, t) < oo))V 
((x(p,m) < 0) A (d&m[p][po] > bi(p,t)/x(p,m)) A {bi{p,t) > 0))) then 
E — add-set_item{E , (p, t), del_clk) 

return E 



Fig. 5. Algorithm for finding sets of possible events. 



multiple times without a change in direction on the given variable. A clock c{p, t) 
can be deleted from the dbm if p is a continuous variable, and it can increase 
above its upper bound, b^{p,t), or decrease below its lower bound, b\{p,t). For 
each new event found, the function addseEitem determines the set in which to 
place this event. In other words, it determines which events it must occur with 
simultaneously. 
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4.2 State Updates 

After finding all possible events, the top level algorithm selects one possible event 
set, and pushes the current state set and all remaining event sets onto the stack. 
It then executes this event set to derive a successor state set. If the successor 
state set is a new state set, it adds it to the list of state sets, assigns the new state 
set to the current state set, and finds all possible next events. If the successor 
state set is not a new state set and the stack is not empty, it pops an entry off 
the stack. If the stack is empty, the algorithm terminates and returns the set of 
reachable state sets. Due to the continuous variables, the state space may not 
be finite in which case this algorithm does not terminate. 

The algorithm for performing a set of events is shown in Fig. 6. This algorithm 
first restricts the dbm to allow enough time to advance such that each event in 
the event set can occur. It then recanonicalizes the zone using Floyd’s all-pairs 
shortest path algorithm. Next, it removes the clocks from the dbm that have 
fired and places them in the set of fired clocks, Cf. If all clocks in the preset 
of a transition have fired then a transition can fire which updates the state as 
described below. The dbm is then updated to add any newly introduced clocks 
and remove any deleted clocks. Next, advance _time is called to determine the 
maximum time advancement that is possible without skipping any events. This 
function initially sets the maximum value, dbm[{p,t)][po], for each clock in the 
zone to the upper bound of the clock, du{p,t). The remainder of advance_time 
determines the maximum value of each continuous variable, dbm[p][pQ\. If the 
variable is increasing and its value is below the lower bound for a given predicate, 
it must not be allowed to advance beyond the predicate value until the clock for 
that arc is added to the zone. If it is above the lower bound and below the upper 
bound for some predicate, it must not be allowed to exceed the upper bound on 
that predicate until the corresponding clock has been removed from the zone. 
Finally, if it is above the upper bound on the predicate, the continuous variable 
can be unbounded. When the continuous variable is decreasing, the algorithm 
is similar except it approaches the upper bound from above and should not be 
allowed below the upper bound until the clock is introduced, below the lower 
bound until the clock is removed, and is allowed to reach negative infinity if it 
is below the lower bound. Finally, the zone is recanonicalized again. 

The algorithm to fire a transition is shown in Fig. 7. The firing of a transition 
t results in the marking being updated to remove tokens from all discrete places 
in the preset and add tokens to all discrete places in the postset of t. All fired 
clocks on discrete places in the preset of t are removed from Cf along with any 
that conflict with the firing transition. Next, the algorithm performs a variable 
assignment on each continuous place in the postset of the firing transition. This 
may result in a change in the predicate values on some arcs requiring some 
clocks to be introduced or deleted. The algorithm next adds a new clock to the 
dbm for each clock in the postset of t. The firing of a transition may change 
which continuous variables are active and their rates of change. The algorithm 
adds newly activated variables to the zone while it removes newly deactivated 
variables. Finally, the zone is warped to reflect the new rates of change. 
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do_events((m, dhm, val, Ci, a) ,a) 
dbm = restrict (.dbm , a) 
dbm = recanonicalize (dbm) 

{dbm,Cf) — fire_clocks (dbm, Cf , a) ; 
if (3t.V(p',t) G -Fd U Fh-{p',t) e Cf) then 

{m, dbm, val, a, Cl) = fire_transition((m, dbm, uaZ, Ci, c/) ,t) 
(dbm,Ci,C{) = intro_del_clocks (dbm, Ci, Cf, a) 
dbm = advance_time (dbm, Ci, Cf) 
dbm = recanonicalize (dbm) 
return {m,dbm,val,a,ci) 

Fig. 6. Algorithm to perform a set of events. 

fire_transition((m, dbm, val, Ci, Cf) ,t) 

m' = m- {p'\{p',t) G Fd} + {p'\{t,p') G Fd} 
foreach (p',t) G Fd 
for each {p',t') G Fd 
Cf = Cf - {{p',t')} 
dbm = dbm_remove(dbm, (p'j t')) 
foreach {t,p').p' G Pc 

dbm = dbm_remove ( dbm, p') 
val[p'] = (vi,Vu) 

(dbm, Ci, Cf) = update_dbm(dbm, uaZ, Ci, Cf ) 
foreach (t,p') G Fd 
foreach (p',f) G Fd 

dbm = dbm_add ( dbm, (p', t') , (0, 0)) 
foreach p' G Pc 

if (x(p',m') / 0 A p' 0 dbm) then 
(dbm, val) = dbm_add (dbm, p' ,uaZ[p']) 

Ci = update_Ci (m, m', dbm, Ci,p', waZ[p']) 

(dbm, val) = recanonicalize (dbm) 
val = val .remove (dbm, iiaZ ,p0 
else if (x(p',m') — 0 A p' € dbm) then 

if (x(p',m) < 0) then swap ( dbm [po ] [p] , dbm[p][po]) 
val[p'] = (dbm[po][p'], dbm[p'][po]) * |i(p',m)| 
dbm — dbm_remove(dbm,p') 

(dbm,Ci) = dbm.warp (m, m', dbm, Ci) 
return (m' ,dbm,val,Ci,ct) 

Fig. 7. Algorithm for firing a transition. 



4.3 Warping the Zone 

The state sets of a THPN cannot be represented directly by zones. Zones require 
that all dimensions advance at rate one. In other words, the zone always evolves 
along 45 degree lines. While clocks always advance with rate one, continuous 
variables may increase or decrease with other rates. Therefore, this section pro- 
poses the following new approximation approach which is key to our efficient and 
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Fig. 8. A zone warped by a (a) positive and (b) negative rate. 



yet accurate reachability analysis. In this approach, when a variable advances 
with a rate other than one, the zone is warped in this dimension such that it 
advances with rate one. For example, if x{p,m) = 2, the variable x{p) must be 
replaced with ^ * x{p) which does advance with rate one. This, however, has 
the effect of warping the zone as shown with the darker polygon in Fig. 8(a). 
The dbm, however, can only be used to represent polygons made with 45 and 90 
degree angles. Therefore, this zone must be encapsulated in a larger zone (the 
lighter gray box in Fig. 8(a)) that includes this zone while using only 45 and 
90 degree angles. The algorithm for warping a dbm is shown in Fig. 9. The first 
part of the algorithm performs the warping and encapsulation just described. 
The third loop is used when a rate is negative which requires a more compli- 
cated encapsulation. In this case, the zone must also be reflected along the axis 
into the negative domain. This is accomplished by first swapping the minimum 
and maximum entries in the zone. In the resulting zone, all 45 degree angles 
become 225 degree angles which cannot be represented. This can be seen in the 
darker box shown in Fig. 8(b). To address this problem, we must encapsulate 
the zone in a box by setting all separations in the corresponding row and column 
to infinity. The lighter gray zone is the result of the encapsulation. 

5 Experimental Results 

The analysis method has been automated within the AT ACS tool [25], and it has 
been applied to several examples. The first example is the PLL model shown in 
Fig. 1. Our analysis finds all reachable states of this THPN model in 6.9 seconds 
represented using 1012 zones. ^ 

Table 1 shows our verification results. The first half of the table are small 
benchmark examples from the HyTech distribution. We automatically trans- 
lated the provided hybrid automata models into THPN models. These results 
show that our verification results agree with HyTech’s in comparable runtime. 
Unfortunately, the larger benchmark examples include features not currently 
supported by our tool such as ranges on the rates. 

^ All results are obtained on a 1.7GHz Pentium M computer with 1 GB of memory. 
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dbm_warp(m, m' , dbm, a) 

foreach {i,j}\i G dbm,j G dbm,i ^ j 

if (\x{i,m)/x{i,m')\ > \x{j,m)/x{j,m')\) then 
dbm\i][j] = {x{j,rn) * dbrn[{\[j])/x{j,m') + 

(—1 * x{j, m) * dbm[i][0])/x{j, m!) + (x(i, m) * dbrn\i\[G\) / x{i, m') 
dbm[j]\i] = {x{j,m) * dbm[j][t\)/x{j,m') + 

(—1 * x{j, m) * dbm[0][i])/x{j, m') + (x(i, m) * dbm\Q][i]) / x{i, m') 

else 

dbm[j][i] = {x{i,m) * dbm[j][i])/x{i,m')+ 

(—1 * x{i,m) * dbm[j][0])/x{i,m') + {x{j,m) * dbm[j][0])/x{j,m') 
dbm[i][j] = {x{i,m) * dbm[i][j])/x{i,m') + 

(—1 * x(i,m) * dbm[0][j])/x{i,m') + {x{j,m) * dbm.[0][j])/x{j,m') 
foreach p G dbm A p G Pc 

dbm[po][p] = \x{p,m)/x{p,m')\ * dbm[po][p] 
dbm[p][po] = \x{p,m)/x{p,m')\ * dbm[p][po] 
foreach p G dbm A p G Pc 

if (x{p,m)/x{p,m') < 0) then 

Ci = Ci — {{p,t)\{p,t) G Ph A -■ active (d6m,p,t)} 
swap (.dbm [po] [p] , dbm [p] [po] ) 
foreach i G dbm 

if (pffiAi^po) then 
dbm[i][p] = dbm[p][i] = oo 
recanonicalize (dbm) 
return {dbm, a) 



Fig. 9. Algorithm for warping the dbm. 



The second half of the table are versions of the tunnel diode oscillator shown 
in Fig. 10 [7]. The numerical parameters used for this example are from [8]. 
The goal of verification is to ensure that II oscillates for specific circuit pa- 
rameters and initial conditions. The ACTL property to verify is {AG{AF{Il < 
0.3mA))) A {AG {AF {I I > 0.7mA))). The inequalities on the continuous variables 
are modeled in the net by adding Boolean variables to the net which are true 
when the inequalities hold. This circuit can be described with two differential 
equations: 

^^ = h^-HVc)Fll) 

^ = y{-Vc- R- Il + Vin) 

dt Lj 

where ft- is a piecewise model of the tunnel diode behavior: 

( e.oiosyj* _ Q.mnvi + 0.054514 0 < 14 < 0.055 

h{Vd) = < 0.0692Vj* - 0.0421 Vj2 q. 00414 -h 8.95794 • lO""* 0.055 < 14 < 0.35 

[ 0.2634Vj3 - 0.2765Vj2 + 0.096814 - 0.0112 0.35 < 14 < 0.50 



The system is modeled using our differential equation discretization method. 
Sixteen discrete regions are required to model the oscillatory/non-oscillatory 
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Table 1. Benchmark verification results. 





1 ATACS 


HyTech j 


Example 


Zones 


Runtime(s) 


Verifies? 


Runtime(s) 


Verifies? 


Water controller 


29 


0.03 


Yes 


0.07 


Yes 


Temp controller 


45 


0.05 


Yes 


0.12 


Yes 


Billiards 


140 


0.09 


Yes 


0.14 


Yes 


Diode_osc (high precision) 


2321 


2.34 


Yes 


overflow 


N/A 


Diodemoosc (high precision) 


2529 


2.41 


No 


overflow 


N/A 


Diode_osc (low precision) 


44 


0.04 


No 


9.5 


No 


Diodemoosc (low precision) 


35 


0.06 


No 


13.7 


No 



R L II 

AAM 



Vin 










c 



Vc 



y 



Fig. 10. Tnnnel diode oscillator circuit (Vin=0.3v, L=l/rH, and C=lpF). 



behavior of the circuit resulting in a THPN with 14 places, 49 transitions, 144 
arcs, and 32 unique rates with up to 3 digits of precision. The property is verified 
for a range of initial conditions in which II is between 0.4 to 0.5mA and Fc is 
between 0.4 and 0.47V. As expected, the property verifies with R = 20012 in 
2.34s after finding 2321 zones, and the property did not verify with R = 24212 
in 2.41s after finding 2529 zones. We also attempted this verification using the 
HyTech tool [14], but it is unable to complete due to arithmetic overflow errors. 
HyTech can complete analysis with less precision on the rates, but the model of 
the circuit no longer produces oscillation. Therefore, the verification results are 
incorrect. Though runtimes are not reported for the Boolean mapping approach 
in [7], we believe our analysis method is competitive in runtime, and that it is 
more accurate since the continuous variables are modeled explicitly. 

6 Conclusions and Future Work 

This paper introduces THPNs, a new hybrid Petri net model capable of rep- 
resenting systems composed of a mixture of digital and analog components. 
THPNs allow modeling of continuous values such as voltage and current while 
still being able to model discrete events. This paper also presents a method 
for representing a discretized differential equation model as a THPN. Finally, 
it develops an efficient reachability analysis algorithm for formal verification of 





Verification of Analog and Mixed-Signal Circuits 439 



analog and mixed-signal designs modeled as THPNs. Our analysis method is 
based on DBMs and relies on warping, a new way of manipulating a DBM to 
encapsulate behavior of continuous variables with differing rates. The model and 
analysis method are demonstrated by modeling and verifying relevant properties 
of several benchmark examples including a tunnel-diode oscillator circuit and a 
PLL circuit. Future work includes further investigation of automatic generation 
of THPNs from system specifications. We also plan to extend the capabilities 
of analysis to support more features such as ranges on the rates of continuous 
transitions. Finally, we plan to develop abstraction and partial order methods 
to reduce the size of the state space. 
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Abstract. In this paper, we describe a first-order linear time temporal 
logic (LTL) model checker based on mnltiway decision graphs (MDG). 
We developed a first-order temporal language, Cj^ivig* , which expresses 
a subset of many-sorted first-order LTL and extends an earlier language, 
C,MT>Q, defined for an MDG based abstract CTL model checking. We 
derived a set of rules, enabling the transformation of Cj^i-og* formu- 
las into generalized Biichi automata (GBA). The product of this GBA 
and the abstract state machine (ASM) model is checked for language 
emptiness. We have lifted two instances of the generalized Strongly Con- 
nected Component(SCC)-hull (GSH) checking algorithm [17] to support 
abstract data and uninterpreted functions based on operators available 
in the MDG package. Experimental results have shown the superiority 
of our tool compared to the same instances of GSH implemented with 
BDDs in VIS. 



1 Introduction 

Formal verification has received considerable attention from the electrical en- 
gineering, computer science and the industry communities, where many BDD 
based formal verification tools being developed over the years. These, however, 
suffer from the well-known state space explosion problem. Multiway Decision 
Graphs (MDGs) [5] have been introduced as one way to reduce this problem. 
MDGs are based on a many-sorted first-order logic with a distinction between 
concrete and abstract sorts. Abstract variables are used to represent data sig- 
nals, while uninterpreted function symbols are used to represent data operations, 
providing a more compact description of circuits with complex data path. Many 
MDG based verification applications have been developed during the last decade, 
including invariant checking, sequential equivalence checking, and abstract GTL 
model checking [21] of abstract state machines (ASM) [5]. The MDG tools are 
available at [ 22 ]. 

In this paper we introduce a new MDG verification application by imple- 
menting automata based model checking of a subset of first-order linear time 
temporal logic (LTL). Generally, LTL model checking verifies a Kripke structure 
with respect to a propositional linear time temporal logic (PLTL) formula. A 
PLTL formula (j) is valid if it is satisfied by all paths of the Kripke structure 
M . The validation of (j) can be done by converting its negation into a Gener- 
alized Biichi Automaton (GBA) [19] B-, 0 , composing the automaton with the 
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model M, and checking its language emptiness [19]. The main idea of the work 
we describe in this paper is to lift classical LTL model checking procedures to 
the language emptiness checking (LEG) of a GBA encoded with MDGs. To this 
end, we define an extended temporal logic, called Cjavg* , for which we have 
developed a set of derivation rules that transform Cmvg* properties into PLTL 
formulas augmented with a transformation circuit, which will be composed with 
the system model (ASM) under verification. We use an automata generator to 
get a GBA for the negation of this PLTL formula. Language emptiness checking 
based on two instances of the GSH algorithm [17] is finally performed on the 
product of this latter and the composed ASM described earlier. We call this new 
MDG verification application MDG LEG. 

The rest of the paper is organized as follows: Section 2 describes related 
work. Section 3 overviews the notion of multiway decision graphs. Section 4 
defines the first order linear temporal logic Cmvo* and related transformation 
rules. Section 5 describes the language emptiness checking algorithms. Section 
6 provides a case study applying the developed MDG LEG tool on an ATM 
(Asynchronous Transfer Mode) switch fabric. Finally, Section 7 concludes the 
paper. 



2 Related Work 

The idea of first-order temporal logic model checking is not new. For instance, 
Bohn et. al [3] presented an algorithm for checking a first-order GTL specification 
on first-order Kripke structure, an extension of “ordinary” Kripke structures by 
transitions with conditional assignments. The algorithm separates the control 
and data parts of the design and generates the first-order verification conditions 
on data. The control part can be verified with Boolean model checking, while 
the data part of the design has to be verified using interactive theorem proving. 
Gompared to this work, our logic is less expressive since Cmvg* cannot accept 
existential quantification. However, in our approach the property is checked on 
the whole model automatically, while in [3] a theorem prover is needed to validate 
the first-order verification conditions. Besides, our method can be applied on 
any finite state models, while their application is limited to designs that can be 
separated into data and control parts. 

Hojati et.al [13] proposed an integer combinational/sequential (IGS) concur- 
rency model to describe hardware systems with datapath abstraction. They used 
symbols such as finite relations, interpreted and uninterpreted integer functions 
and predicates, and proceeded the verification of IGS models using language 
containment. For a subclass of “control-intensive” IGS models, integer variables 
in the model can be replaced by enumerated variables, hence enabling a verifi- 
cation at the Boolean level without sacrificing accuracy. Gompared to IGS, our 
ASM models are more general in the sense that the abstract sort variables in 
our system can be assigned any value in their domain, instead of a particular 
constant or function of constants as in IGS models. For the class of IGS mod- 
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els where finite instantiations cannot be used, our verification system can still 
compute all the reachable states and check properties. 

Cyrluk and Narendran [6] defined a first-order temporal logic — ground tem- 
poral logic (GTL), which for universally quantified computation paths, falls in 
between first-order and propositional temporal logics. GTL models consist of 
first-order language interpretation models and infinite sequences of states. The 
validity problem is the same as checking of an LTL formula. The authors fur- 
ther identified a decidable fragment of GTL, which consists of □P(always P) 
formulas, where P is a GTL formula only containing an arbitrary number of 
“Next” operators. For this decidable fragment, they did not show how to build 
the decision procedure, though. Gompared to [6], our Cmvg* is more expressive 
(cf. Section 4). 

In [4], Burch and Dill also presented a subset of first-order logic, specifically, 
quantifier-free logic of equality with uninterpreted functions, to specify proper- 
ties for verifying microprocessor control circuitry . Their method is appropriate 
for verification of microprocessor control because it allows abstraction of dat- 
apath values and operations. However, their approach cannot verify liveness 
properties. 

Based on MDGs, Xu et. al [21] developed an abstract GTL model check- 
ing tool, which verifies an ASM with respect to a first-order temporal logic 
{^m-dg)- C.M'DQ consists of limited set of templates including: A(P), AG(P), 
AF(P), A(P)U(Q), AG(P ^ (F(g)), and AG((P) ^ ((g)U(P)), where P, g, 
and P are NextdetTormulas.^ This MDG tool does not allow temporal operator 
nesting and cannot deal with properties beyond its templates. For example, a 
property like G(a = 1 — >■ F(6 = 1) A F(c = 1)) cannot be expressed in Cmvg- 

3 Multiway Decision Graphs 

The underlying logic of MDGs is a many-sorted first-order logic with a distinc- 
tion between concrete and abstract sorts. Goncrete sorts have an enumeration, 
while abstract sorts do not. This distinction leads to the definition of concrete 
variables, abstract variables, individual constants appearing in the enumeration, 
generic constants of abstract sorts, abstract function symbols and cross-operators 
[5]. Let / denote a function symbol of type oi x . . . x a„ — >■ a„+i. Then, if o;„+i 
is an abstract sort, then / is an abstract function symbol; if an+i is a concrete 
sort, and at least one of the sorts ai, ... ,a„ is abstract, / is a cross-operator. 

An interpretation is a mapping ip that assigns a denotation to each sort, con- 
stant and function symbol and satisfies the following conditions: 1) The denota- 
tion Ip (a) of an abstract sort a is a non-empty set; 2) If a is a concrete sort with 
enumeration {ai,---,a„} then ip(a) = {ip(ai), ■■■ ip(an)}, and ip(ai) yf '4’{cij) 
for I < t < j < n; 3) If c is a generic constant of sort a, then ip{c) G ip{a)] 4) If / 
is a function symbol of type x • • • x «„ — >■ a„+i, then ip{f) is a function map- 
ping from ip{ai) x • • • x ip{an) into the set V’(a„+i); 5) A variable assignment 

^ Since our C,MT>g* is an extension of Cmug, Next_let_formula will be explained in 
Section 4. 
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with the domain X compatible with an interpretation 7 /’ is a function (f> that 
maps every variable x G X of sort a to an element 4>{x) of '(/’(a), used to 

represent the set of f/'-compatible assignments of the variables m X] \= P 

means that the formula P is true under an interpretation ip and ■i/'~compatible 
variable assignment (p, ip \= P represents ip,(p \= P for every ■!/)— compatible vari- 
able assignment (p, and |= P means ip \= P for all ip; Two formulas P and Q are 
logically equivalent iS \= P ^ Q. 

An MDG is a finite directed acyclic graph. An internal node of an MDG 
can be labeled with a concrete variable with its edge labels being the individual 
constants in the enumeration of the sort; Or it can be an abstract variable and 
its edges are labeled by abstract terms of the same sort; Or it can be a cross- 
operator with its edges labels being the individual constants. An MDG may have 
only one leaf node denoted as T, which means all paths in the MDG correspond 
to true formula. MDGs are used to represent relations as well as sets in abstract 
state machines (ASM). An ASM is defined as a tuple D = (A, F, W, Fi,Ft, Fq), 
where X, Y and W are disjoint finite sets of input, state, and output symbols, 
respectively. Fj is an MDG representing the initial states. Ft is an MDG for the 
state transition relation, and Fq is an MDG for the output relation. 

The MDG package provides a set of basic operators, including conjunction 
(Conj) of a set of MDGs with different nodes of abstract variables; Disjunc- 
tion (Disj) of a set of MDGs; Relational Product (RelP), which computes con- 
junction, existentially quantification, and renaming substitution in one pass; 
Pruning by Subsumption (PbyS) produces an MDG, representing the differ- 
ence of two abstract sets, given say by MDGs P and Q, by pruning the edges 
from P contained by Q. Finally, a procedure ReAn(G, C) computes the set 
of reachable states of a state machine M = Sj, Rt, Ro) repre- 

sented by D, with any interpretation ip, while the invariant condition C holds 
in all reachable states. In M, Sj = Set^(Fi) = {(p G <1 >y \ \= (3G)F/}, 

Ro = Set^(Fo) = {{<p,(p',<p") G <P% X X \ iP,cPU <P' U iP" \= Fq}, and 
Rt = Set^iFr) = {(<^, , <P") G x x \ ip , <p \J <p' \J cP" ^ Ft}. 

4 MDG Language Emptiness Checking Approach 
4.1 MDG LEG Tool Structure 

The structure of the MDG LEG is shown in Figure 1. It takes as inputs a property 
in a first-order temporal logic Cmvo* (to be defined later) and a system 
design M modeled as an ASM. The tool first transforms the Cmvg* formula 
Lp into a set of atomic propositions (AP,p) augmented by a circuit C^p, which 
constructs all basic logic operators as well as equality conditions. The details on 
the construction will be given in Section 4.3. The tool then builds the equivalent 
PLTL formula <p. C^, is further composed with M to produce a new ASM M' by 
the composer, which connects ASM_variables (input, state and output signals) in 
C,p with the system model. Using AP^,, we reconstruct a syntactically equivalent 
PLTL property formula (p, which we feed into an automata generator producing 
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a GBA B-, 0 . The latter is then composed with M' to produce an ASM M" 
and a set of fairness conditions Finally, the tool checks if the language of 

the composed machine is empty using an adapted forward generalized Strongly 
Connected Component(SCC)-hull (GSH) algorithm [17]. In the following, we give 
a definition for Cmvo* and detail the transformation procedure. The checking 
algorithm will be described in Section 4. 




Fig. 1. Structure of MDG LEG 



Given a PLTL formula (j), there exist many procedures to generate a GBA 
for such that GPVW [11], GPVW-h [11], LTL2AUT [7], and Wring [10]. 
Although these procedures are based on the same tableau construction method. 
Wring improves on others by applying rewriting on the PLTL formula and sim- 
plifying the result GBA. We have chosen the Wring procedure for the automaton 
generation in MDG LEG. The constructed GBA is an w-automaton with several 
sets of accepting states defined as the fairness condition. A run is accepted if it 
contains at least one state in every accepting set infinitely often. As a result, the 
language of the automaton is nonempty iff the automaton contains a fair cycle 
- a (reachable) cycle that contains at least one state from every accepting set, 
or equivalently, a (reachable) nontrivial SGG that intersects each accepting set 
[19]. 



4.2 A First-Order Temporal Logic 

Let .7^ be a set of function symbols and V a set of variables. We denote the set 
of terms freely generated from T and V by T{T,V). The syntax of an Cmvq* 
formula is given by the following grammar: 



Sort S 

Abstract sort S 
Concrete sort S 
Generic constant C 
Concrete constant C 
Variable V 

Abstract variable V 
Concrete variable V 



= S\S 

= a I /3 I 7 I . . . 

= a \ f3 \j \ ... 

= a 1 6 1 c 1 . . . 

= a 1 6 1 c 1 . . . 1 b 1 1 1 

= U 1 U I Z 

= x\ y\ z \ ... 

= x \ y \z \ ... 
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Ordinary variable V 


:= i 1 // 1 1 1 • • • 




Atomic formula 


A 


:= true \ false \ Eq 






Eq 


:= Vi = V2 \ V = C \ Vi = C\ Cl = C2 


F = T(P,V) 


N ext J et^formul a 


N 


:= A |!A 1 A & A 1 A II A 1 A A | 

1 LET {V = V) IN A 


X A 


Property 


P 


:= A |! P 1 Pi & P 2 1 Pi II P 2 1 Pi U P 2 
1 FP 


1 Pi R P 2 1 G P 


Semantics. An 


infinite path tt in an ASM M is an infinite sequence of 



states. We denote by tt* the suffix of tt beginning with the state tTi, which is 

the state in tt. We use Val^\ja{t) to denote the value of term t under a ip- 

compatible assignment (j) (cf. Section 3) to state variables, input variables, and 
output variables and a ^/’-compatible assignment a to the ordinary variables v. 
The satisfaction of an Cmvo* formula along a path tt under the ^/’-compatible 
assignment a to the ordinary variable v is defined inductively as follows. 

TT , (J I — 1 1 — t2 iff Fn/7rQUcr(tl) — F n/TTQUo" (^2 ) . 

7T , a 1= LET (n = t)IN p iff TT,a' \= p 

where a' = a \ {(■c,o-(n))} U {(n, FaZ^ou<r(t))} 

TT, a 1= \p iff it is not the case that Tv,a \= p. 

TT, a 1= pEzq iff 7T, CT 1= p and tt, o |= q. 

7T, (7 1= p I g iff 7T, O' 1= p or 7T, (T 1= g. 

TT, O' 1= p — >■ g iff TT, o |=!p or 7T, O' ^ g. 

7T, O' 1= Xp iff 7Tl , O' 1= p. 

7T, O' 1= Gp iff TTj,a \= p for all j > 0. 

7T, a 1= Fp iff TTj, O' 1= p for some j > 0. 

TT, a 1= pUg iff for some k > 0, itk, u \~ g, and ttj , o |= p for all j{0 < j < k). 

TT, a 1= pRg iff for some k > 0,TTk,o- \= q, or there exists j, 

TTj, O' 1= p for all j{0 < j < k). 

An Cmvg* formula is said to be satisfied in the machine D if it is satisfied 
along a path of M; a formula is said to be valid in D if it is satisfied along all 
paths of M. 

4.3 C_\/iT>g* Transformation 

As shown in Figure 1, the first step of the MDG LEG is to transform the for- 
mula Cmt>G* into a PLTL formula. The transformation of formula in Cmdo* 
to PLTL is obtained by generating a circuit description for each subformula 
(NextdetTormula) in the property. The generated circuit provides a single atomic 
proposition output, which will replace the entire Next Jet -formula in the original 
property. Applying the same procedure to each subformula results in a simpler 
formula. The rules which govern this construction are given below: 

— < V\ = V2 >= absComp(Vi, V2), where absComp is a cross-operator, which 
denotes the truth of Vi = V 2 in the current state of the circuit. The cross- 
operator is partially interpreted by the rewriting rule: absComp{X,X) = 1, 
which can be interpreted as “the value of the two abstract terms are equal 
if the two abstract terms are syntactically the same” . 
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— < Vi = C >= absComp{Vi,C). 

— <Vi=Q >= comp{Vi,(l), where comp is a concrete function. 

— < LET (U = V) INiV >. Here, search the referred ASM variable V in N, 
count the nesting depth n of the X operator between the Let operator and 
the atomic formula, and add a sequence of n “registers” . The input of the 
sequence is V, and its output is the ordinary variable U. 

— Ni ^ N 2 is handled as an abbreviation for -lA^i || N 2 - 

— II, and ! are built using the logic gates “and”, “or” and “not” gates, 
respectively. 

— < V = T >= buildterm{V,T), where huildterm is a function that imple- 
ments a term T G T(T,V). It builds the function symbols for each element 
in T and connects them together according to the appearance order in the 
term. The inputs are the variables of V, while the output is the term T. The 
cross-operator absComp is used to denote the truth of U = T. 

The function < ., . > is generalized to a property definition as follows: 

1. For the atomic formula false or true, we generate constant signal 0 or 1, 
respectively. 

2. < Top N >, where Top = U, R, G, F. According to temporal operators before 
the Next Jet Jormula N, we rewrite the N to {true & N) or {false \ N). If the 
immediate operator is F, U, or R, we use {true & N), or {false \ N) otherwise. 
This is done to make sure that the property is checked after n cycles (using 
registers) from the initial states, where n is the maximum nesting depth of 
the X operators in the property. 

3. < Pi op P 2 >=< P\ > OP < P 2 >, where OP is an implementation of op. 

4. < Top P >= Top < P >, where Top = U, R, G, F. 

4.4 Illustrative Example 

To illustrate the Cmvo* transformation approach, we use the property 

G{{state = fetch-st & input = inc2) -G 

F{LET{v = pc) IN {XXX {state = fetchst Sz pc= inc{inc{v)))))) 

on an abstract counter introduced in [6]. Figure 2 shows the transformation 
procedure for the above property. The property contains two Next Jet Jormulas 

N1 = {state = fetch-st & input = inc2) 

and 

N2 = {LET{v = pc) I N {X X X {state = fetchst Sz pc = inc{inc{v)))))) . 

The circuit descriptions for A^l and N2, shown in the middle of the figure, are 
derived by applying the rules described in the previous section. Thereafter, the 
property G(< iVl >— > F < N2 >) is transformed into G(p = 1 — >■ F(q = 1)), 
which is translated by Wring into the GBA, shown on the right side of the figure. 

The generated GBA consists of a state transition graph (ASM) and a set of 
acceptance (fairness) conditions, which will be used by the language emptiness 
checking algorithm described in the next section. 
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if X = a then y = 1 absComp is a crossterm with a 

else y = 0 rewrite rule absComp(x,x) = 1 



Fig. 2. Circuit synthesis and automaton generation for G( (state = fetch_st & input = 
inc2) — >■ F{LET{v = pc) I N {X X X {state = fetch^st &c pc = inc{inc{v))))) 

5 Language Emptiness Checking Algorithm 

5.1 Generic SCC Hull Algorithm 

The GSH algorithm is based on the following definitions described by 
the /X— calculus [16,17]. 

E p q = pZ. q \/ {p A EXZ), E p S q = pZ. q\/ { p A EY Z) 

EG p = vZ. pA EX Z, EH p= vZ. p A EY Z 

EE p = E true Up, EP p = E true Sp, 

where pZ.r (t stands for the p— calculus formula) denotes the least fixpoint of 
r, vZ. T is its greatest fixpoint, EX Z denotes the direct predecessors (images) 
of states in the set Z and EY Z denotes the direct successors of Z. 

Let G = {V,E) be a graph and GL = {G\, - ■ ■ ,Gm} C G a set of Biichi 
fairness conditions, and Tp = {ES\,- ■ ■ , ESmi EY} be a set of forward operators 
over V, where ESi is defined as XZ. E Z S{Z A Ci) and EY as \Z. Z A EY Z. 
Similarly, let Tg = {EU\,- ■ ■ , EUm, EX} be a set of backward operators over 
V, where EUi is defined as XZ.E Z U {Z A Ci) and EX as XZ. Z A EX Z. 
The GSH algorithm first computes the set of reachable states from the initial 
states, and then recursively removes the states that cannot be reached by fair 
SGGs as well as those that cannot reach fair SGGs until it reaches a fixpoint. 
The GSH algorithm can be summarized as follows: 

Step 1) Galculate the set of states Z reachable from the initial states. 

Step 2) Fairly pick an operator r from TpUTg. Apply r to Z, and let Z = t{Z). 
Step 3) Gheck if Z is a fixpoint of all the operators in Tp UTp. If yes, stop; 
otherwise, go to Step 2. 
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Remark 1. In Step 2, “fairly pick” means the operator t is selected under two 
restrictions: The operator ES (or EU) cannot be selected again unless other 
operators in Tp{Tb) make changes to Z , and operator EX (or EY) cannot be 
selected again if it makes no changes to Z unless other operators in the set Tp 
(or Tb) make changes to Z. 



Remark 2. The forward operators in Tp remove the states that cannot be 
reached by the fair cycles, where ESi removes the states that cannot be reached 
from the accepting set Ci within the current set, and EY deletes the set of states 
that cannot be reached from a cycle within the set. The backward operators Tp 
remove the states that cannot reach any fair cycles, where EUi removes the 
states that cannot reach the accepting set Ci within the current set, and EX 
deletes states that cannot reach a cycle within the set. 

5.2 EL and EL2 Algorithms 

The EL [8], EL2[9], and HH [12] algorithms have been proposed as the instances 
of the GSH algorithm by specifying a particular order to pick operators, namely: 

ESi,ES2, • • • , ES^, EY,---, EY, ESi,ES2p ■ ■ (EL2) 

ESi , EY, ES 2 , EY,---, ESm, EY, ESi ,EY,--- {EL) 
EUi,ESi,- - - , EUm,ESm, EX, EY, EX, EY,---, EUi,ESi, - - - {HH). 

In the following, we will focus on EL and EL2 algorithms, which can be imple- 
mented with only forward operators. 

5.3 MDG Implementation of EL and EL2 Algorithms 

Two main operators are needed for the implementation of the EL and EL2 algo- 
rithms, namely, image computation and fixpoint computation. Image computa- 
tion (forward operator) is sufficient since the GSH algorithm can be implemented 
by using only forward operators [17]. Furthermore, since in the MDG package 
there is no conjunction operation with the same abstract primary variables, the 
operators ESi and XZ.Z A EY Z cannot be applied either. A deeper observation 
reveals that under the assumption of Z is forward-closed the operators ESi and 
XZ.Z A EY Z can be replaced by EPi and XZ. EY Z, respectively, where EPi 
is defined by^ 



EP, Z = XZ. {E true S {Z X Ci)) 

Following the above argument, the operators in the GSH algorithm can be 
replaced with Tp = {EPi, EP 2 , - - - , EPm, XZ.EYZ} if only forward operators 
are used in the GSH algorithm. Below, we will focus on how to implement the 
above operators in the MDG package. Note that in the above definition EPi 

A complete proof can be found in [20]. 
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still contains a conjunction operation Z f\Ci. However, since Z and Ci do not 
share any abstract variables (in fact Ci does not use any abstract variables), 
this conjunction operation is feasible with MDGs. 

Image Computation (EY) - Relational Product (Relp) 

The operator EY which computes the direct successors of a set of state of Z 
can be implemented by the MDG operator Relp. The arguments of Relp are {/, 
Z, Ft}, {X U T), and Y' — >• Y, where the MDG I represents the set of inputs, 
Z represents the current set, and the MDG Ft represents the transition relation. 

Fixpoint Computation (EP) - Reachability Analysis (ReAn) 

Given a state transition graph G = {T,I), the EP operator mainly im- 
plements the computation of the set of all reachable states from a given set. 
Therefore, 

EPZ = Etme S Z = fxY.Z V (true A EYY) = Z V EYZ V EY^{Z) V • • • . 

In the MDG package, the procedure ReAn(G,C) has been developed for im- 
plicit state enumeration that tests if an invariant condition C is true at the 
output of every set. The operator EP is implemented by ReAn(G, true). 

The procedure ReAn(G, true), takes a state transition graph G = 
{X,Y, Fj, Ft), returns the set of reachable states from the initial states Fj for 
any interpretation In the rest of the paper, we will use ReAn(G) as a short 
form for ReAn(G, True). 

Remark 3. The ReAn(G) may be non-terminating when a set of states cannot 
be represented by a finite MDG due mainly to the presence of abstract variables 
and uninterpreted functions. Several solutions do exist to alleviate this problem, 
for example, the initial state generalization [1]. 

MDG EL/EL2 Algorithms 

The MDG based EL and EL2 algorithms take as its arguments a state tran- 
sition graph G = {X,Y, Fi, Ft), where X and Y are sets of input and state 
variables, respectively. Fj is the set of initial states, Ft is the transition rela- 
tion, and CL = {Ci, ..., Cm} is a set of acceptance Biichi acceptance conditions 
Ci, represented by MDGs consisting of concrete variables. 

The algorithms work as follows: First compute the set of states Z reachable 
from the initial states Fj , and then iteratively apply the operators EPi,EP 2 , . . . , 
EPm, XZ.EY Z, ■ ■ ■ , XZ.EY Z (EL2) or the operators EPi, XZ.EY Z, EP 2 , 
XZ.EYZ, ... ,EPm, XZ.EY Z (EL) until no changes to Z can be made. If 
the fixpoint is empty, the algorithm returns “Succeed”; otherwise, it returns 
“Failed” . The EL algorithm can be described as follows, where C, Z, I are 
sets and K is an integer. 
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1 MDG EL Algorithm (G = {X, Y, Fj, Ft), GL = {Gi, Ga, . . . , G„} ) 

2 C:= 0; K : = 0; 

3 Z := ReAn(G); 

4 Do { 

5 C := Z; 

6 For C G CL { 

7 F/ = Conj(Z,C); /* update the set of initial states */ 

8 Z := ReAn(G); 

9 K := K-hl; 

10 I :=NewInputs(K); 

11 Z :=RelP({/,Z, Ft}, XU y,r' ^ y) 

12 } 

13 If (Z = 0) then return “Succeed” 

14 } until (Pbys(C, Z) = F) 

15 return “Failed” 



In the above algorithm, line 3 computes the set of reachable states. In fact, 
in this step ReAn(G) performs the operation FF(F/). Lines 4-14 represent 
the main body of the algorithm. Lines 7-8 compute the set of states reached 
by Z A Ci,i < m. Lines 9-11 compute the image of Z, where Re IP essentially 
performs operation EY{Z). The operations are iteratively applied to Z until no 
changes are made to it. Obviously, in each loop, Z\ C Z and Z Q since the 
operations ReAn, Conj and Relp remove some states from the set Z. Therefore, 
it is sufficient to test ^ C Z for the fixpoint C, = Z. Pbys is used for these tests 
at line 12. 

1 MDG EL2 Algorithm (G = (X, X, F/, Ft), CL = {Gi, G 2 , . . . , G^j) 

2 C:= 0; K :=0; 

3 Z := ReAn(G); 

4 Do { 

5 C := Z; 

6 For C G CL { 

7 F/ = Conj(Z,C); /* update the set of initial states */ 

8 Z ReAn(G); 

9 } 

10 K := K-Pl; 

11 I :=NewInputs(K); 

12 Z1 :==RelP({/,Z, Ft}, XU r,r' ^ X); 

13 While (Pbys(Z, Zl) yf F) Do{ 

14 Z := Zl; 

15 K:= K-Pl; 

16 I :=NewInputs(K); 

17 Zl :=RelP({/,X, Ft}, XU X,X' ^ X); 

18 } 

19 If (Z = 0) then return “Succeed” 
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20 } until (Pbys(C, Z) = F) 

21 return “Failed”; 

In the above algorithm, line 3 computes the set of reachable states by using 
ReAn(G). Lines 4-20 compose the main body of the algorithm. Lines 6-9 
compute the set of states reached by all Z A Cj, i < m. Lines 10 - 18 compute the 
image of Z until the fixpoint is reached. It is obvious that Zi Q Z and Z Q 
in each iteration, since the operations ReAn, Conj and Relp remove some states 
from the set Z . Therefore, it is sufficient to test Z Q Z\ for Z\ = Z and Z 
for C = Z. Pbys is used for these tests at the lines 13 and 20. 

Remark 4- As pointed out in Remark 3, ReAn(G) is non-terminating, which may 
lead to the non-terminating of the EL and EL2 algorithms. We can apply the 
same approaches to solve the problem. 

6 Case Study 

We have implemented the proposed MDG algorithm in Prolog and integrated it 
into the MDG package. We have conducted a number of experiments with small 
benchmark designs as well as with larger case studies to test the performance of 
our tool. In this section, we present the experimental results of the verification 
of an Asynchronous Transfer Mode (ATM) switch fabric as case study. The 
experiments were carried out on a Sun Ultra-2 workstation with 296MHZ GPU 
and 768MB of memory. 

The ATM switch we consider is part of the Fairisle network designed and 
used at the Gomputer Laboratory of Gambridge University [15]. The ATM switch 
consists of an input controller, an output controller and a switch fabric. In each 
cycle, the input port controller synchronizes incoming data cells, appends control 
information, and sends them to the fabric. The fabric strips off the headers from 
the input cells, arbitrates between cells destined to the same port, sends success- 
ful cells to the appropriate output port controller, and passes acknowledgments 
from the output port controller to the input port controller. 

We use an RTL design of this ATM switch fabric with 4 inputs and 4 
outputs defined as 8 variables of abstract sort (n-bit) modeled in MDG-HDL 
[18]. In the following we discuss five sample properties, Pl-5. PI and P2 are 
properties checking the acknowledgment procedure, involving no data signals, 
while P3,P4, and P5 are properties checking the data switching signals. PI, P2 
and P5 are safety properties, while P3 and P4 are liveness properties. Details 
about the ATM switch fabric model as well as the specification of the above 
properties can be found in [20] 

The experimental results of the verification of these properties with the LEG 
are summarized in Table 1, including GPU time, memory usage and number 
of MDG nodes generated. To compare our approach with BDD based language 
emptiness checking methods, we also conducted experiments on the same ATM 




First-Order LTL Model Checking Using MDGs 453 



Table 1. Experimental Results with MDG LEG using EL and EL2 





MDG LEG (EL) 


MDG LEG (EL2) j 


Time 

(sec) 


Memory 

(MB) 


# MDG 
Nodes 


Time 

(sec) 


Memory 

(MB) 


# MDG 
Nodes 


PI 


300 


30 


100325 


312 


32 


100325 


P2 


288 


32 


100382 


280 


32 


100382 


P3 


254 


41 


129782 


338 


52 


157739 


P4 


314 


42 


131132 


346 


54 


159325 


P5 


340 


47 


139255 


329 


46 


131775 



switch fabric using the ItLmodeLcheck option of the VIS tool [2]. Since VIS re- 
quires a Boolean representation of the circuit, we modeled the data input and 
output as Boolean vectors of 4-bit (which stores the minimum header informa- 
tion), 8-bit, and 16-bit. Experimental results (see Tables 2 and 3) show that the 
verification of P4 (8-bit) as well as the properties P3 - 5 (16-bit) did not termi- 
nate (indicated by a “*”), while our MDG LEG was able to verify both in a few 
minutes for n-bit (abstract) data. It is to be noted that VIS uses very powerful 
cone-of-influence [14] model reduction algorithms, while our MDG LEG does not 
perform any reduction on the model. When we turned off this model reduction 
VIS failed to verify any of the properties on 4-bit, 8-bit and 16-bit models. 



Table 2. Experimental Results with VIS using EL algorithm 





GSH 4-bit 


GSH 8-bit 


GSH 16-bit 1 


Time 

(sec) 


Memory 

(MB) 


# BDD 
Nodes 


Time 

sec) 


Memory 

(MB) 


# BDD 
Nodes 


Time 

(sec) 


Memory 

(MB) 


# BDD 
Nodes 


PI 


27.2 


52 


3095941 


31.2 


52 


3081297 


36.8 


52 


3064133 


P2 


11.7 


45 


1670014 


14.0 


45 


1659550 


21.8 


46 


1699964 


P3 


12.4 


40 


1356704 


899.8 


629 


98706620 


* 


* 


* 


P4 


533.1 


167 


32770721 


* 


* 


* 


* 


* 


* 


P5 


14.7 


44 


1596773 


321.8 


137 


35106376 


* 


* 


* 



7 Conclusion 

In this paper, we introduced a new application of the MDG tool set, which imple- 
ments a first-order LTL model checking algorithm. The tool, MDG LEG, accepts 
abstract state machines (ASM) as system models, and properties specified in a 
new defined first-order logic (T^ug*), which extends a previously developed 
language {Cmvg)- We developed rules enabling the transformation of Cmvo* 
properties into generalized Bfichi automata (GBA) making use of the Wring pro- 
cedure. For the language emptiness checking, we adapted two instances (EL and 
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Table 3. Experimental Results with VIS using EL2 algorithm 





GSH 4-bit 


GSH 8-bit 


GSH 16-bit 1 


Time 

(sec) 


Memory 

(MB) 


# BDD 
Nodes 


Time 

sec) 


Memory 

(MB) 


# BDD 
Nodes 


Time 

(sec) 


Memory 

(MB) 


# BDD 
Nodes 


PI 


27.1 


52 


3095941 


29.5 


52 


3081297 


36.8 


52 


3064191 


P2 


11.5 


45 


1670014 


14.1 


45 


1659550 


21.7 


46 


1699964 


P3 


12.5 


40 


1356704 


899 


628 


98706620 


* 


* 


* 


P4 


533 


166 


32770721 


* 


* 


* 


* 


* 


* 


P5 


13.8 


44 


1596773 


317.6 


137 


35198884 


* 


* 


* 



EL2) of the generic SCC-hull (GSH) algorithm using MDG operators. Exper- 
imental results have shown that, thanks to the support of abstract data and 
uninterpreted functions, our MDG LEG tool outperforms existing BDD based 
LTL model checkers implementing EL and EL2 in VIS. 

At present, we are investigating the generation of counter-examples, which, 
unlike BDD based approaches, is not straight forward since MDG does not sup- 
port backward trace operators. We are also looking into ways to integrate model 
reduction algorithms (e.g., cone-of-influence) in order to improve the overall per- 
formance of the tool. 
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Abstract. We propose a novel approach to locate errors in complex 
counterexample of safety property. Our approach measures the distance 
between two state transition traces with difference of their control flow. 
With respect to this distance metrics, our approach search for a wit- 
ness as near as possible to the counterexample. Then we can obtain the 
set of control flow predicates with difference assignment in witness and 
counterexample. Run this witness-searching algorithm iteratively, we can 
then obtain a predicate list with priority. A predicate with higher prior- 
ity means that this predicate is more likely the actual error. Experiment 
result shows that our approach is highly accurate.^ 



1 Introduction 

Today, model checking is one of the most important formal verification ap- 
proaches. It is widely employed to verify software and hardware system. One 
of its major advantages in comparison to such method as theorem proving is the 
production of a counterexample, which explains how the system violates some 
assertion. 

However, It is a tedious task to understand the complex counterexamples 
generated by model checking complex hardware system. Therefore, how to auto- 
matically extract useful information to aid the understanding of counterexample, 
is an area of active research. 

Many researchers [1,2,4, 9] engage in locating errors in counterexample. They 
first search for a witness as similar as possible to a counterexample. Starting 
from the difference between them, actual error can then be located by perform 
a breath- first source code checking. 

However, these approaches suffer from a serious problem called ’’multiple 
nearest witnesses” (MNW). Under certain circumstance, there are multiple near- 
est witnesses. Distance between counterexample and these witnesses are of the 

^ Supported by the National Natural Science Foundation of China under Grant No. 
90207019; the National High Technology Development 863 Program of China under 
Grant No. 2002AA1Z1480 

F. Wang (Ed.): ATVA 2004, LNCS 3299, pp. 456-469, 2004. 

© Springer- Verlag Berlin Heidelberg 2004 
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same, and only one of them contains the actual error. Then the first nearest wit- 
ness may be far from the actual error. Thus, they need to perform a very deep 
breath first code checking and traverse a large fraction of the code, before they 
found the actual error. In this way, MNW significantly decrease the accuracy of 
error locating. 

At the same time, if they meet with a node with large number of fan-in 
in breath-first code checking, they will also need to traverse a large number of 
nodes. This will also significant decrease the accuracy of error locating. 

To overcome these problems, we propose a novel error locating approach that 
based on iteratively witness searching, to improve the accuracy of error locating. 
We measure the distance between two state transition traces with difference 
between assignments to their control flow predicates. With this distance metrics, 
we search for a witness as similar as possible to a counterexample. Then the 
predicates that take on different assignment can be appended to the tail of a 
prioritized list. Run this witness-searching algorithm iteratively, we can then 
obtain a prioritized list of predicates. A predicate with higher priority is more 
likely the actual error. 

The main advantage of our technique is: We use the prioritized predicate list 
as the result of error locating, no need to perform breath-first code checking. 
Thus, avoid the impaction of MNW and wide fan-in node. 

We implement our algorithm in NuSMV[12]. Moreover, it is straightforward 
to implement our algorithm for other language such as verilog. The experiment 
result shows that our approach is much more accurate than that of [2]. 

The remainder of the paper is organized as follows. Section 2 presents back- 
ground material. Section 3 describes the impaction of MNW and wide fan-in 
node. Section 4 presents the algorithm that locates error in counterexample. 
Section 5 present experiment result of our approach and compare it to that of 
[2]. Section 6 reviews related works. Section 7 concludes with a note on future 
work. 

2 Preliminaries 

2.1 Counterfactual, Distance Metrics, and Cansal Dependence 

A common intuition to error explanation and error localization is that: successful 
executions that closely resemble a faulty run can shed considerable light on 
the cause of the error [1,2, 4,9]. David Lewis [6] proposes a theory of causality 
based on counterfactual, which provides a justification for this intuition. Lewis 
holds that a cause is something that makes a difference: if the cause c had not 
been, the effect e would not have been. Lewis equates causality to an evaluation 
based on distance metrics between possible worlds. We present the definition of 
counterfactual and causal dependence below: 

Definition 1 (Counterfactual). Assume A and C hold in world IV, then coun- 
terfactual A^C hold in world W w.r.t distance metrics d iff there exist a W’ 
such that: 
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1. A and C hold in world W’; 

2. For all world W”, if A hold and C not hold, then d(W, W’) < d(W, W”). 

Definition 2 (Causal Dependence). If predicate c and e both hold in world 
W, then e causally depends on c in world W iff ->e. 

We express ”e causal depends on c in world w” as following formula: 

c{w) A e{w) A 3w'{-‘c{w') A ~‘e{w') A Vw” ((-ic(w”) A e(w”)) s 

{d{w,w') < d{w,w^^)))) ^ ' 

And c(w) means that predicate c is true in world w. e(w) means that predicate 
e is true in world w. Formula (1) means that e causally depend on c iff an 
execution that remove both c and e are nearer to origin execution , than any 
executions that remove c only. 

2.2 Pseudo Boolean SAT 

Pseudo Boolean SAT(PBS)[14] introduce two types of new constrain into SAT: 

1. Pseudo Boolean constraints of the form:^CiXi < n, with n, c, G N and 
Xi G B; 

2. Pseudo Boolean optimization goal of the form minif^CiXi) and 
max{Y,c^Xi)-, 

PBS can efficiently handle these two types of constraints alongside CNF 
constraints. We use PBS to search for most similar witness. 

2.3 Bounded Model Checking, Witness, and Counterexample 

Bounded model checking(BMC)[16] is a technique to find bounded-length coun- 
terexamples to LTL properties. Recently, BMC has been applied with great 
success by formulating it as a SAT instance and solving it with efficient SAT 
solvers such as zchaff[20]. 

General discussion of BMC is fairly complex. So we refer the reader to A. 
Biere ’s excellent paper[16] for detail of BMC. For simplicity, we only discuss 
invariant properties here. 

Given a system with a boolean formula I representing the initial states, a 
boolean formula Ti{Xi, Wi, representing the i-step transition relation, and 

an invariant with a boolean formula Pk representing the failure states at step k. 
the length-k BMC problem is posed as a SAT instance in the following manner: 

F = lAPkA /\ T,(A„Wi,W+i) 

0<i<k 

For state transition path tt with length k,if formula f always hold on it, then 
we denote it by tt \=k /. 

We give our own definition of witness and counterexample below: 
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Definition 3 (Bounded Witness). For LTL formula f and bound k, if state 
transition trace tt \=k f , then we call tt the hounded witness of f. 

Definition 4 (Counterexample). For LTL formula f and bound k, if state 
transition trace tt \=k ~'f , then we call tt the counterexample of f. 

For Definition 3, when the bound k can be deduced from the context, we 
omit it and just call it witness. 



3 Multiple Nearest Witnesses Effect 

Shen[l,2] and A.Groce[4,9] describe error locating approach based on nearest 
witness searching. For counterexample C, they search for only one nearest witness 
W. Then starting from the difference A between W and C, they perform breath- 
first code checking to locate the actual error. 

However, when we analysis experiment result of [2], we found that nearest 
witness is not unique. For counterexample C, assume the set of all nearest witness 
is{IFi|0<i<n— 1}. And difference between Wi and C is Z\i, distance between 
Wi and C is \Ai\. Assume the actual error e belong to only one arbitrary Ai, 
and the witness obtain by [2] is Wj . In most case, i ^ j, that means e ^ Ai. 
So algorithm of [1,2, 4, 9] need to perform breath-first code checking to locate the 
actual error. 

Under certain circumstance, Ai is far from actual error e. Thus, the breath- 
first code checking must search very deeply into the code. 

At the same time, if we meet with a node with large number of fan-in in 
breath-first code checking, we will also need to traverse a large fraction of the 
code. 

Until now, this section is full of bad news. But we have a good news now. 
While analysis the result of [2], we found that: although the witness Wj obtain 
by [2] does not always contain actual error e, but after running witness searching 
algorithm iteratively for no more than 4 times, we can always find the actual 
error in Aj. 

So we propose the iteratively witness searching algorithm in Sect 4. This 
algorithm performs fairly well in practice, and improves significantly compared 
to our previous work [1,2]. 

4 Iteratively Witness Searching Algorithm 

We first introduce the overall algorithm flow in Sect 4.1, and then describe every 
steps of this algorithm in detail. 
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Fig. 1. Overall flow of our error locating algorithm 



4.1 Overall Algorithm Flow 

As shown in Figure 1, this algorithm contain 2 phase: 

1. Generation of Basic Counterexample: This phase corresponds to tra- 
ditional bounded model checking (BMC). Before performing BMC, we need 
to extract predicates first. For every control branch, we generate a pred- 
icate. This predicate takes on value 1 at its corresponding control branch 
only. When BMC generate the basic counterexample, it will also assign arbi- 
trary value to these predicates. With these predicates and their assignment, 
we can construct the control flow of counterexample. 

2. Error Locating: In this phase, we run following three steps iteratively: 

a) Witness Searching: searching for a witness Wi as similar as possible 
to basic counterexample, and obtain the set of predicates Ai that take 
on different value in counterexample and witness. 

b) Predicate Filtering: Eliminating all predicates irrelevant to violation 
of LTL formula f from Ai . 

c) Witness Blocking: Preventing Wi from being generated again by fu- 
ture iterations. 

The iteration in the 2nd phase is the major difference between our approach 
and that of [1,2,4, 9]. 

4.2 Predicate Extraction 

In Predicate Extraction, we generate predicates for every control branch. This 
predicate takes on value 1 at its corresponding control branch only, and value 
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case 

conditionO : dataO; 
conditionl : datal; 
condition2 : data2; 



predicateO:= 



I conditionO : 1 : 



predicatel:= 

case 

.condilimnLl : Q; 



conditionl : 0; 
condition2 : 0; 



I conditionl : 1: 



condition2 : 0; 
esac 



predicate2:= 

case 

conditionO : 0; 
conditionl : Qi 
|iondition2 : 1; 



a) NuSMV assignment b) 1st predicate c) 2nd predicate d) 3rd predicate 



Fig. 2. NuSMV conditional assignment and all extracted predicates 



0 at other control branch. Because we implement our algorithm on NuSMV, so 
we present example with syntax of NuSMV in Figure 2. It is straightforward to 
extend Predicate Extraction to other language. 

The conditional assignment statement of Figure 2a contains three control 
branches. We insert one predicate for each control branch, as shown in Figure 
2b~2d. Every predicate can take on value 1 at corresponding branch only, as 
shown in rectangles. 

4.3 Witness Searching 

After BMC generate basic counterexample ttI, it will assign value to all con- 
trol flow predicates extracted in last section. With these predicates and their 
assignment, we can construct the control flow of basic counterexample. 

In this section, we will present the algorithm that search for nearest witness. 
Before that, we must first define Predicate Distance Metrics. 

Definition 5 (Predicate Distance). Assume that state transition trace irl 
and tt 2 contain a common predicate set TAG={tago ,t<^gi ,...,tagn }. And as- 
signment to TAG in ttI is {tago =ao ,tagi =oi ,...,tagn =an}- Assignment to 
TAG in tt 2 is {tago =bo ,to-gi =b\ . .,tagn =bn}- Then define distance between 

ttI and tt2 as: 

n 

d{TTl, 7t2) = A{i) 

i=0 

with A(i) = / ? 

With above definition of distance metrics, we define nearest witness as: 

Definition 6 (Nearest Witness). Assume irl is basic counterexample of LTL 
formula f its bound is k. tt 2 is nearest witness of irl iff: 

1. tt2 is counterexample o/7t2 \=k -<f, this means that tt2 doesn’t violate formula 
f within bound k; 

2. d(Trl, tt2)>1; 

3. For any tt 2’ that satisfy entry 1 and 2, d(Trl, TT2)<d(iTl, tt2’); 

4- For any counterexample ttS of formula f d(Tr2,Tr3)>l. 
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We will now present our witness-searching algorithm with above definition: 

Algorithm 1 Nearest Witness Searching Algorithm 

1. Run NuSMV command genJtlspec-bmc-onepb to generate CNF for assertion 
-if and bound k; 

2. According to entry 2 of Definition 6, Encode d(7rl,7r2)>l with PBS inequal- 
ity; 

3. According to entry 3 of Definition 6, Encode minimization of d(7rl,7r2) with 
optimization goal of PBS; 

4. Solve above three constraints with PBS to obtain 7 t 2. This will ensure that 
7 t 2 is compliant to entry 1, 2 and 3 of Definition 6; 

5. Run NuSMV command genJtlspec-bmc-onepb to generate CNF for assertion 
f and bound k; 

6. According to entry 4 of Definition 6, encode d(7r2,7r3)<0 with PBS inequality; 

7. Solve above two constrains with PBS, and make sure it is UNSATISFIABLE. 
This will ensure that 7 t 2 is compliant to entry 4 of Definition 6; 

With 7 t 2 obtain in above algorithm, the set of predicates that take on different 
value in counterexample ttI and witness 7 t 2 is denoted by: 

Error = {tagi\tagi € TAG and A{i) yf 0} (2) 

Theorem 1 show that at least one predicate in Error is the cause of -■f. 

Theorem 1. -■/ causally depend on 5 ( 7 t 1 ) = \/ tagieErrori^^-di == “*)■ 

Proof. First, in basic counterexample ttI, for all tagi G Error, tagi == Oi always 
hold, so <5(7 t 1) is true in ttI. 

Because ttI is a counterexample, so -if(Trl) is true. 

With above conclusion, and replace 3w’ of (1) with 7 t 2, we can reduce (1) 
into following formula: 

-,5(7r2) A /(7t2) A Vw” ((-,5(u>”) A -/(u>”)) 

(d(7rl, 7 t 2) < d(7rl, w”))) (3) 

Because for all tagi G Error, A{i) yf 0 hold, so obviously -i(5(7r2) is true. 

At the same time, because 7 t 2 is a witness, so f(7r2) is true. 

Then we can reduce (3) into following formula: 

V'u;”((-'(5(w”) A-i/(w”)) => (d(7rl,7r2) < d(7rl,w”))) (4) 

Prove by contradiction, assume (4) is false, then there exist a w” such that 
the following formula hold: 

(-'(5(w”) A -i/('u;”)) A (d(7rl, 7 t 2) > d(7rl, ic”)) (5) 

Because -■f (w”) and f(7r2) both hold, according to entry 4 of Definition 6, 
we can deduce that w” and 7 t 2 has different assignment to TAG, discuss in two 
possible case: 
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1. There exist a tagCError, such that tag(w”)y^tag(7r2), then 
tag(w”)==tag(7rl) must hold, which contradict with -k5(w”); 

2. There exist a tag^Error, such that tag(w”)y^tag(7r2), then tag(w”)y^tag(7rl) 

and must both hold, this will lead to d(7rl,7r2)<d(7rl,w”), which 

contradict with d(7rl,7r2)>d(7rl,w”). 

Therefore, Formula (5) doesn’t hold, and then -■£ must causally depend on 
predicate <5(7 t 1) = == «i) 

4.4 Predicate Filtering 

In Error given by (2), many predicate are irrelevant to the violation of LTL for- 
mula f. They are the byproduct of constructing witness 7 t 2. We need to perform 
a Dynamic Cone of Influence Algorithm to eliminate them from Error. 

Let’s first present the definition of Dynamic Dependence Set. 

Definition 7 (Dynamic Dependence Set). Assume the conditional assign- 
ment statement of variable A is shown in Figure 2a, its i-th condition formula 
is Ci , and its i-th data formula is Di , then i-th dynamic dependence set of 
variable A is 

Dep(A,i)={x\x is a state variable and x is sub-formula o/ -'C„ or 
Ci or Di} 

With above definition, we eliminate irrelevant variable with following algo- 
rithm. 

Algorithm 2 Dynamic Cone of Influence Algorithm 

DCOI { 

Nl:= {all variables of formula f at time frame k } 
do{ 

Nl:^DCOIRecur(Nl ) 

Node:— Node U N1 
{until Nl— — 0 

Node is the Dynamic Cone of Influence of formula f 

} 

DCOIRecur(C ) { 

Node:=0 

for each { 

if (variable Vi is not a primary input) { 

let the n-th branch predicate of Vi is 1 in ttI 
Node:=NodeUDep(ui ,n) 

} 

} 

return Node 

} 

Traditional Static Cone of Influence algorithm will generate a large node set, 
which contain all nodes that are connected to formula f with data dependence 
path. 

Our Dynamic Cone of Influence Algorithm generate a much smaller node set, 
which contain only nodes that ACTUALLY affect f in counterexample ttI. As 
shown in algorithm 2, the bold line state that only Dep(ui ,n) can be added 
into Node set. 
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Now, assume current iteration is the n-th iteration, after Predicate Filtering, 
the set of predicates that are relevant to violation of formula f is: 

Rn = {p\p G Error, and 3n € Node, 

such that p is a control predicate of variable n} (6) 

After iteratively running the whole algorithm multiple times, we obtain mul- 
tiple Rn ■ Then the union of them Ui?„ form a prioritized list. If n<m, p G 
and q G Rm, then priority of p is higher than q. A predicate with higher priority 
means that it is more likely the actual error. 



4.5 Witness Blocking 

To prevent the witness 7 t 2 of current iteration from being generated again, we 
must add 7 t 2 as a blocking constrain into witness searching algorithm. 

We use a blocking set MASK to record 7t2, and modify Algorithm 1 to prevent 
7t2 from being generated again. Entry 4 of Algorithm 1’ is this new blocking 
constrain. 

Algorithm 1’ Modified Nearest Witness Searching Algorithm 

1. Run NuSMV command genJtlspec-bmc-onepb to generate CNF for assertion 
-if and bound k; 

2. According to entry 2 of Definition 6, Encode d(7rl,7r2)>l with PBS inequal- 
ity; 

3. According to entry 3 of Definition 6, Encode minimization of d(7rl,7r2) with 
optimization goal of PBS; 

4. For all witness tt G MASK, encoding d(7r, 7t2)>0 with PBS inequal- 
ity, such that they will not be generated by current iteration; 

5. Solve above four constraints with PBS to obtain 7 t 2. This will ensure that 
7 t 2 is compliant to entry 1, 2 and 3 of Definition 6; 

6. Run NuSMV command genJtlspec-bmc-onepb to generate CNF for assertion 
f and bound k; 

7. According to entry 4 of Definition 6, encode d(7r2,7r3)<0 with PBS inequality; 

8. Solve above two constrains with PBS, and make sure it is UNSATISFIABLE. 
This will ensure that 7 t 2 is compliant to entry 4 of Definition 6 ; 

5 Experiment Result and Analysis 

First, we briefly introduce two different score functions for evaluating error lo- 
calization techniques in Sect 5.1. One for algorithm of [2] , another for algorithm 
of this paper. Next, we present the experiment results and analysis in Sect 5.2 
and 5.3. 
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5.1 Score Functions for Evaluating Error Localization Techniques 

To compare different error locating technique, quantifiable metrics must be built. 
We call these metrics as score function. Because our error locating result is 
different from that of [1,2, 4, 9], so our score function is also different from that 
of [1,2, 4,9]. However, both functions are of the same meaning: 

After locate the actual error under guidance of error locating result, how 
much percentage of program statements have not being check. Obviously, 
a higher score means more accurately error locating. 

We first introduce score function of [1,2,4, 9]: 

Consider a breath-first search of the program dependence graph(PDG) 
starting from the set of nodes in the potential error report R. Call R a 
layer, BFSq ■ Then define BFSn+i a set containing BFSn and all 
nodes reachable in one direct step in the PDG from BFSn -let BFSki 
be the smallest layer containing at least one error node. Then the score 
for algorithm of [2] is 1 — 

We then describe score function of this paper: 

Starting from the head of prioritized predicate list, we check each pred- 
icate to determine if it is the actual error. Assume that the actual error 
is contained in the result of the n-th iteration i?„ ,then is the 

minimal set of checked predicates that contain the actual error. Then 
the score of our algorithm is 1 — ■ 

With above two score functions, we can then compare the result of this paper 
and that of our previous work [2] . 

5.2 Experiment Result 

The origin gigamax cache coherence protocol [17] is distributed with NuSMV[12]. 
We convert all its CTL assertions into LTL equivalent version such that we can 
check it with BMC package of NuSMV. The property used to detect errors is: G 
!(p0.writeable & pl.writeable). This means that it is not possible to make two 
caches writeable at the same time. 

We insert 10 errors into it. Five of them are data flow errors. The other five 
are control flow errors. 

The NuSMV source code contains 189 lines. After flatten there are 458 lines 
and 41 conditional assignments. 

All experiments are performed on Pentium 3 IGHz. 

As shown in Table 1, we compare result of this paper and that of [2]. The 
third column is the bound of basic counterexample N, so the total number of 
conditional assignment statement is 41*N. The 4-th column is the size of smallest 
layer containing at least one error node. The 5-th column is score of algorithm 
of [2]. the 6-th column is the run time of [2]. 
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The number of iterations of our algorithm is shown in the 7-th column. Size 
of the minimal set of checked predicates that contain the actual error, is shown 
in the 8-th column. Score of our algorithm is shown in the 9-th column, the run 
time of our algorithm is shown in 10-th column. 



Table 1. Experiment result of this paper and that of [2] 





error 


Bound of 
Cex 


Result of [2J 


Result of this paper 


\BFSki\ 


Score(%) 


Time 


Num of Iter 


1 Cl<2<7^ Hi 1 


Score(%) 


Time 


Data 

flow 

error 


D1 


6 


18 


92.7 


19.56 


3 


22 


91 


100 


D2 


4 


6 


96.3 


16.72 


2 


6 


96.3 


D3 


5 


2 


99 


25.13 


1 


2 


99 


D4 


5 


14 


93.1 


21.92 


3 


11 


94.6 


D5 


5 


7 


96.5 


14.25 


2 


3 


98.5 


Control 

flow 

error 


R1 


6 


8 


96.7 


19.22 


3 


6 


97.5 


R2 


5 


24 


88.3 


22.86 


3 


17 


91.7 


R3 


6 


24 


90.2 


23.37 


2 


11 


95.5 


R4 


2 


18 


78 


7.12 


2 


5 


94 


R5 


5 


26 


87.3 


17.71 


2 


8 


96 



5.3 Result Analysis 

As shown by Table 1, all score of this paper is higher than 90%. This is signifi- 
cantly higher than that of [2]. From Table 1, we can conclude that: 

1. All errors that have been accurately located in [2] are also accurately located 
in this paper. 

2. All errors that are poorly located in [2], such as R2, R3, R4 and R5, are all 
accurately located in this paper. 

3. All actual error can be located within 3 iterations. 

Our new algorithm improve the accuracy of error locating in two aspect: 

1. We do not need to perform breath- first code checking any more, so we avoid 
the impact of multiple nearest witness, which is describe in Sect 3 in detail; 

2. We also avoid the impact of wide fan-in node. 

6 Related Work 

It is a tedious task to understand the complex counterexamples generated by 
model checking complex hardware system. Therefore, how to automatically ex- 
tract useful information to aid the understanding of counterexample, is an area 
of active research. 
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Research works in this field can be divided into 3 categories: 



6.1 Counterexample Compaction 

These works focus on making the counterexample more succinct. 

T.Ball[5] search for all state graph transition edges that only belong to the 
counterexample . 

Jin, Ravi and Somenzi[3] propose a game-like explanation in which an adver- 
sary try to force the system into error. They try to partition a counterexample 
into two types of fragments: fate fragment that are unavoidable path leading 
to violation of assertion, and free will fragment that can avoid the violation of 
assertion. 

K. Ravi [3] formulate the extraction of a succinct counterexample as the prob- 
lem of finding a minimal assignment that, together with the boolean formula 
describing the model, implies the violation of LTL formula. 



6.2 Error Locating 

Error Locating approach is a much more aggressive form of Counterexample 
Compaction. They drop the completeness requirment, and try to find even more 
succinct error locating results. 

In [1], Shen propose the control predicate distance metrics for the first time, 
and present a nearest witness-searching algorithm with this metrics. 

In [2] , Shen integer predicate filtering into the framework of [1]. 

A.Groce[8] generate multiple similar successful and failing versions of a coun- 
terexample, and analysis their difference. 

A. Groce [4,9] define ’’data flow distance” between two paths, and search for a 
witness most similar to a counterexample, then analysis their difference to locate 
actual error. 

G.Fey[ll] analysis the counterexample of equivalence checking, and gener- 
ate multiple similar counterexample, then locate the actual error by analysis 
commonness of these counterexample. 



6.3 Annotate Counterexample with Proof 

Several researchers also try to explain non-linear counterexample and witness 
with annotated proof steps. 

M.Chechik[13] generate proof for non-linear counterexample of ACTL, and 
then extend their approach to deal with fairness condition. 

D. Peled[15] generate proof for witness of LTL formula. 

K. Namjoshi[19] concentrate on generating a proof of validity for a run of 
global y:i-calculus model checker. 

Tan [18] extend Namjoshi’s work to local model checking. 
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7 Conclusions 

It is a tedious task to manually analysis counterexample of complex hardware 
system. We have shown how to locate the bug in counterexample generated by 
bounded model checker. Experiment result show that our approach is highly 
accurate. 

Our current implementation is based on NuSMV. However, our techniques 
are quite general and we are porting it to Verilog. 

To locate bug more accurate, we are considering using multiple assertions to 
locate the bug instead of ’’one assertion” debugging approach of this paper. 

We are also considering locating bug in loop-like counterexample of liveness 
property. 

Finally, due to the computation complexity of PBS, it is infeasible to di- 
rectly locate error in large-scale concrete model, so we are considering impose 
abstract/refine approach into our witness searching algorithm. 



Acknowledgements. We would like to thank the anonymous referees for their 
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Abstract. This paper introduces how formal verification technology has 
been utilized in the Information Industrial Institute (III) for the strategic 
development of 3G telecommunication industry in Taiwan. This work 
aims at developing a set of golden formal models for WCDMA protocols 
to apply in the better efficiency and effectiveness of engineers simulating, 
testing, emulating, and synthesizing for various designs. In this paper, 
firstly, some discussion about the general framework in using these golden 
models would be brought up, then our models for the sub-layers of the 3G 
protocols are to be described. Finally, Red, a model-checker/simulator 
for real-time systems, would be utilized to check the correctness and 
precision of our golden models. 



1 Introduction 

The services over mobile technology have become indispensable in many aspects 
of our everyday life. As the production of 3G handsets keeps on rising up, the 
needs for reliable embedded protocol software for WCDMA handsets as well as 
effective and efficient verification methods is acuter than ever. But the protocols 
of WCDMA are way too massive and complicated to be handled without any 
computer-aided tools. 

Information Industrial Institute (III) is a semi-official organization in Taiwan 
with the strategic goal of developing key technologies for future industry. One 
of the strategic project is the Formal Verification of 3C telecommunication sys- 
tems (FV3G). The goal of FV3C is to develop a formal verification framework 
that could be repetitively used in various tasks in the development of WCDMA 
protocol systems. The framework should support the following features: 

1. Automated checking the WCDMA protocol specification. 

2. Automatic test pattern generation with coverage estimation. 

3. Automated generation of high-quality codes. 

4. Minimum effort in applying the framework to various verification tasks. 

F. Wang (Ed.): ATVA 2004, LNCS 3299, pp. 470-473, 2004. 

© Springer- Verlag Berlin Heidelberg 2004 
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In this article, we discuss the framework of project FV3G. Our solution is based 
on the following two tasks: 

1. Construction of a golden model of the WCDMA protocol systems. 

2. Development of a library of specification properties in formal semantics. 

In section 2, we shall explain the framework of project FV3G. In section 3, 
we briefly present our experimental platform and the result. Section 4 is the 
conclusion. 

2 FV3G 

The framework of project FV3G is shown in Fig. 1. The input of the project is 
the 3GPP specification. In our work group, the formal language SDL is used to 
descript these specifications. The specification properties were found manually 
according to these specifications. As the SDL models are translated to formal 
models accepted by model checker, the specification properties can be used to 
do the conformance-testing or the correctness checking. If the model satisfies the 
specification properties, this model can be claimed as a golden model. Therefore, 
the executable codes generated from the golden models would possess more 
confidence for any further commercial release. As to the reusability of those 
SDL models, the golden models and its properties also can be reused in other 
system design. We put these properties into Specification Properties Library 
(SPL) for better management. 






il 









cli'i Ln | 1 1 >:< iiuiir?! i; w: h u:t«irs^i | 







Fig. 1. Framework of FV3G 



Generally speaking, there are three major drawbacks to be conquered in 
those popularly available verification techniques. The first is the commercial 
tools might be user-friendly but seldom powerful enough to serve the purpose, 
while the second is that much effort and money would be wasted in redundant 
work due to imperfect testing flow. Finally, more quality and quantity of test 
patterns are required to assure the product quality. The tool sets integrated into 
FV3G can be used to enhance the performance for our design and verification. 
In I, those blocks in the boldface boxes are the software tools to be employed in 
this work, and some brief discuss these tools are in the following paragraphs. 
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1. Real-time system mo del- checker: Our mathematical models are written 
in communicating timed automata [3] and specification properties in TCTL 
formulas. The models and properties can then be fed to various model- 
checkers for distributed real-time systems. Specifically, we put down the 
models and specifications according to the syntax of Red [5,6], which is 
a model-checker/simulator for timed systems. With model-checking technol- 
ogy, we can verify, with algorithms, whether the model satisfies the specifi- 
cation properties. 

2. Test pattern generator with coverage: Testing means that we use 

input patterns to check whether the implementations respond as expected. 
The quality of the test patterns is also important to enhance the confidence 
of engineers and project managers in all the stages of project development, 
official release and even commercial promotion. The tool of TTCN generator 
is used to adjust the format of the out from Red to the standard format of 
TTCN for better added- value of the software product. 

3. Simulator with coverage: Simulator has been the major tools in guaran- 
teeing the product designs in the industry. The quality of the simulation can 
only be as good as the precision of the models. Strategies for the generation 
of the simulation traces are similar to those for test pattern generation. The 
quality of simulation traces are also subject to standards similar to those for 
test pattern generation. 

4. Automatic synthesizer: Components of the golden model can also be 

translated to programs in C languages. Specifically, when execution times 
of the underlying instruction set are available, the synthesizer can help us 
developing real-time programs that satisfy various timing requirements. 

3 Experimental Result 

Fig. 2 illustrates the conceptual model of RLC UM entity [1] verification. In order 
to enhance the performance of verification, each layer with different abstraction 
level has been abstracted. According to [1] clause 11.2, the section about RLC 
UM Data Transfer procedures, the model for UM data transfer can be viewed as 
the peer-to-peer data transmission. 

When RRC layer submits data to RLC layer, the discard function allows to 
avoid buffer overflow. For example, the discard function discharges datas when 
the buffer is full. The TCTL formula discussed below to check the property 
has been applied and that our golden model and implement code result satisfied 
this property very well. 

Example 1. A TCTL formula to check whether the discard function can work 
well. 

forall always ( BUFFER_STATUS_FULL 

implies {forall euenteoZZy INIT_DISCARD .FUNCTION) ); 
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Fig. 2. An experimental platform with Red 



4 Conclusion 

In this paper, a verification platform FV3G is proposed and presented along with 
newly designed golden models of RLC UM entities. Utilizing the tool Regional- 
Encoding Diagram (Red), the reachability and satisfaction of each experiment 
results are all in expect and the processing time consumption is pleasant enough 
to improve the process of WCDMA handset manufacturing. The other formal 
models of layers of UE access stratum protocols are under development, the 
verification library should be much better equipped soon in the future. 
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Abstract. In behavioral verification and synthesis, a large number of 
algebraic expressions can be generated. Their representation needs to 
be compact, efficient, and precise. We designed a basic algebraic nor- 
mal form of multiple algebraic expressions. It supports non-redundant 
storage, and expression transformation and expression efficiently. 



1 Introduction 

A system behavior can be described by an algebraic expression representing a 
system level algorithm. Term rewriting rules [1] can be applied to transform 
an algebraic expression into equivalent ones. Such visiting of equivalent expres- 
sions is called expression traversal. Traversed equivalent expressions must be 
recorded for traversal backtracking, for reachability-based equivalence checking, 
and for avoiding repeated traversal. Storing them in distinct storage in prac- 
tical cases requires 0{mnexpr) storage capacity. Equivalence checking between 
a generated expression and existing expressions in the distinct storage scheme 
requires 0{mnexpr) computational complexity. Here, riexpr is the average expres- 
sion size and m is the number of (existing) expressions to be checked against. 
Expression construction requires 0{nexpr) computational complexity. For com- 
putational complexity of equivalence checking, m is possible to be reduced to con- 
stant time via hashing technique that requires hash reorganization and increases 
storage complexity. In functional programming, term-rewriting techniques [2-4] 
are applied to transform expressions into new ones. Since the rewriting process 
corresponds to a program execution, the expressions being reduced no longer 
needed to be stored in the system. In contrast, behavioral verification and syn- 
thesis must/may need to store and check with all traversed expressions. 

2 Basic Algebraic Normal Form of Expressions 

Firstly, common subexpressions sharing among multiple expressions in compila- 
tion technique [5] can be utilized to reduce storage requirement. However, such 
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technique represents commutative, associative, and distributive binary opera- 
tors in a binary digraph form. Excessive number of equivalent expressions can 
be generated and consume tremendous storage and computation time. 

We represent a single-rooted subgraph of operations satisfying some or all 
of commutative, associative, and distributive laws in a normalized subgraph. 
Subexpressions transformed by applying these three laws are then represented 
uniquely in the shared expression graph. It results in a reduction of storage 
and computation requirements during an expression traversal process. There 
are five cases regarding commutative, associative, and distributive laws that a 
subexpression can be transformed into a unique representation: 

1. For a subgraph of only associative operations, we represent it in an associa- 
tive normal form. Subexpressions can be flattened into a 2-level subgraph 
with the same order with coefficient summation as shown in Fig. 1(a). 

2. For a subgraph of only commutative operations, we represent it in a commu- 
tative normal form. Subexpressions can be reordered in a uniquely defined 
lexicographical order recursively as is shown in Fig. 1(b). 

3. For a subgraph of abelian (commutative and associative) operations, we rep- 
resent it in an abelian normal form. Subexpressions can firstly be flattened, 
and then be reordered as shown in Fig. 1(c). 

4. For a subgraph of distributive operations, we apply the distributive law tran- 
sitively to transform it into an SOP (sum-of-product) normal form as shown 
in Fig. 1(d). When the SOP form is not a normal form in an algebraic system, 
if there exists any other normal form, for example, binary decision diagram 
[6] for logical operators, such normal form can thus be used in combination. 
If it does not exist, subexpressions still are represented in the SOP form. 
Other algebraic rules (axioms) are applied with embedded commutative, as- 
sociative, and distributive laws in the matching engine. 

5. In an abelian operation on operands of identical unary operations, if the 
abelian operation can be propagated down the subexpression and trans- 
formed as a new abelian operation, we will do so to maintain a normal form 
for such operation as shown in Fig. 1(e). 




Fig. 1. Basic algebraic normal form 
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Above normalization forms are applied recursively on expressions represented 
in a shared expression graph. Many equivalent expressions can be reduced to the 
same normal form. Tremendous redundancy can thus be eliminated. Because 
these three laws are quite basic laws of many algebraic systems, we call such 
normal form as basic algebraic normal form. 



3 Expression Transformation Operation 

On a normalized shared expression graph, we can perform two operations to 
support equivalent transformation of algebraic expressions: expression pattern 
matching and subexpression substitution. As shown in Fig. 2, an axiom’s LHS 
is represented as a directed acyclic graph. Its pattern matching is thus a graph 
pattern matching similar to tree pattern matching [7-8] . A matching between an 
axiom’s LHS pattern and a subexpression requires two conditions. The non-leaf 
part of the pattern graph must match the corresponding subgraph in the subex- 
pression. All pattern’s variable must have a consistent subexpression unification. 
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Fig. 2. Pattern matching on shared expression graph 



When an axiom’s LHS pattern is matched, its RHS can be substituted in the 
matched subexpression with variables in RHS bound to unified subexpression 
values derived in LHS pattern matching. As shown in Fig. 3, the matched ex- 
pression consists of four parts: (1) subexpressions bound to variables in variable 
unification, (2) subexpressions in the matched expression that are not involved 
in pattern matching, (3) an internal subgraph matching the non-leaf part of the 
pattern graph, (4) a rooted subgraph connecting subexpressions of part 2 and 
part 3 to form the matched expression. 

In to form the new expression, we replace part 3 with the matched axiom 
RHS graph, called part 3a. Then, connect part 3a to unified subexpressions as 
its component subexpressions. Part 4 is replaced with duplicating its nodes to be 
part 4a that is connected to subexpressions of part 2 and part 3a. Subexpressions 
of part 1 and part 2 can be used as common subexpressions directly. 

We construct part 3a and part 4a in a bottom-up approach. On each con- 
struction step, we examine whether the subexpression node to be constructed 
exists in the current shared expression graph. If true, the existing node is reused. 
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Fig. 3. Insert an expression from axiom RHS substitution 



Otherwise, a new node is constructed for the new subexpression. We also con- 
sider the transformation to satisfy the basic algebraic normal. The property of 
non-redundancy can thus be achieved. 

4 Analysis 

For the expression pattern matching operation, since structural forms of tra- 
versed subexpressions are represented in normal form, we only need subexpres- 
sion root node comparison to check unification equivalence. Its computational 
complexity is thus only 0{riLHs) in a normalized shared expression graph. Com- 
putational complexity of subexpression substitution is 0{sknnHS + n^xpr) = 
0{sriRHS + riexpr)- Subexpression substitution implicitly contains equivalence 
checking with all traversed expressions. Here, s is the average number of shar- 
ing per subexpression, s will not grow large in the case that all represented 
expressions are generated from the same source expression(s). k is the maximum 
cardinality of operators and is usually bound by 2 practically. Uexpr, "tilhs, and 
nRHS are sizes (the numbers of nodes) of the matched expression, the left-hand 
side, and the right-hand side of the matched axiom, respectively. 

On storage complexity, we applied the representation in an implementation 
of an algebraic equivalence verification program. A number of verification test 
cases were performed. Since generated expressions have greater similarity and 
are largely composed of common subexpressions, the size of the shared expres- 
sion graph does not scale up significantly with the number of expressions being 
generated. It empirically shows linear storage complexity 0{m). It is much better 
than 0{mnexpr) storage complexity of distinct expression storage scheme. 
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Abstract. Programmable logic controller(PLC) is a computer system 
for instrumentation and control (I&C) systems such as control of machin- 
ery on factory assembly lines and nuclear power plants. If PLC is used 
to control core reactor in nuclear power plant, it should be classified into 
safety-critical. PLC has various I&C logics made in software, including 
real-time operating system (RTOS). Hence, RTOS must be also proved 
completely. In this paper, we apply formal methods to a development of 
RTOS for PLC,which is developing in Korean national nuclear project, in 
safety-critical level; Statecharts for specification and model checking for 
verification, and we give the results of applying formal methods to RTOS. 



1 Introduction 

A practical embedded system has its own real-time operating system to effec- 
tively manipulate applications. PLC for nuclear power plant also has RTOS. 
Thus, RTOS of PLC for safety-critical system like nuclear power plant must be 
proved its safety and correctness. In this reason, we apply formal methods to 
RTOS development. In this paper, we use Statecharts [2] in STATEMATE MAG- 
NUM (I-LOGIX) to specify and perform model checking)!] to verify RTOS’s 
design specification. 

In this paper, we present our experience in applying formal methods to safety- 
critical level RTOS and result of formal specification and verification. 

This paper is organized as follows: Section 2 overviews a developed RTOS 
for safety-critical I&C system of nuclear power plant. And then, we give result 
of applying formal methods to RTOS. After that, we conclude our paper in 
Section 4. 
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2 RTOS for Safety-Critical System in KNICS 

RTOS for safety-critical system must be safe, robust and reliable in any environ- 
ment so it has to be validated and verified completely. It is known that applying 
formal methods to general operating system is difficult because of its own non- 
deterministic characteristics. But RTOS for embedded system is relatively more 
simple and deterministic. In addition, The number of running tasks is prefixed 
in RTOS and RTOS functional characteristics dependent on its running task’s 
characteristics. In this paper, we verify safety and reliability of focus on three 
verification properties; schedulability, deadlock and priority inversion [3]. 

2.1 RTOS of PLC in KNICS 

KNICS is a national project to develop digital I&C system in Korea. PLC for 
nuclear power plant is a important system of KNICS. The RTOS of PLC in 
KNICS has following functions 

— Scheduler in RTOS of PLC has a priority based scheduling algorithm, in 
which the highest priority task is always chosen from tasks’ ready list in one 
execution step. 

— Inter-task communication mechanism is consists of semaphore, message- 
mailbox and massage-queue. 

— Interrupt is a sporadic signal to preempt CPU. User can define the number 
of interrupts and its handler to response it. 

— Tick is a special and periodic interrupt from external device or CPU timer 
to invoke scheduler. 

— Task has its own memory area which is fixed at compile time. In our PLC, 
there are two periodic tasks at the beginning, which are diagnosis process 
and display process. After then, application tasks such as controlling and 
measuring agent tasks are imported by serial communication in program 
image. Task can be invoked by other tasks dynamically and it can always 
halt or stop by other task or by itself. 

3 Applying Formal Methods to RTOS 

3.1 Formal Specification Results 

We model each part of RTOS using Statecharts. Figl shows a priority- based 
scheduler. Fig2 shows a period task model. We make two task models to verify 
our RTOS. The first task model has four tasks, which are consists of a periodic 
task , two sporadic tasks and idle task. Two sporadic tasks communicate via 
message queue. This sporadic task are synchronized by interrupt handler via 
semaphore. The second task model is that there are 3 periodic tasks and one 
idle task. Among 3 periodic tasks, two tasks shares resource (semaphore). The 
other task is running without any relation with the other tasks. In two task 
models, idle task is running when the other tasks are in dormant or waiting 
state. The first model is made to mainly verify the schedulability and the second 
model is made to verify priority inversion. 
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Fig. 2. Scheduler 



3.2 Formal Verification Using Model Checking 

First, we perform simulation on each task model. During simulation on the 
first model, we find a model error on our RTOS in Statecharts. The error is 
related with interrupt disable. That is, we describe interrupt disabling signal 
as an event, which doesn’t affect the whole system continuously because event 
remains during one simulation clock. We find that our RTOS model stops at 
a certain time because interrupt, which occurs during scheduling time, affects 
running scheduler, and it also invoke another rescheduling at the same time. 
So we change interrupt disable signal into interrupt disable condition variable. 
After that, previous error occurs no longer. 

Next, we perform model check to formally verify these two models. In the 
first task model, we want to verify schedulability and deadlock. To check task’s 
deadline miss, we make time monitor which regularly checks whether a real-time 
task misses its deadline or not. 

Fig3 shows that if interrupt handling time and operating system execution 
time is excluded, the real-time tasks is always guaranteed to execute all of jobs 
on time. 

Fig4 shows the results of priority inversion and deadlock verification. After 
performing model check, two properties in Fig4 are all ’’False” and It indicates 
that there exits priority inversion. But the last property ’’Deadlock by Shared 
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Fig. 4. Priority Inversion and Deadlock Verification 



Resource” is property to verify whether deadlock caused by shared resource can 
occur or not and it is also ’’False”. It means there are no deadlock related with 
shared resource. 

4 Conclusion 

In this paper, we give our experiences in RTOS development with applying 
formal methods; we use Statecharts, which is close to state-diagram, to specify 
our RTOS, and perform model checking to verify it. 

Formal methods are believed to be useful and to give insight of system. 
Because most of formal specification language has mathematical notations and 
complicated semantics, software engineers are reluctant to adopt formal specifi- 
cation language. Therefore we use Statecharts for specification language because 
it is close to state-diagram that is familiar to general engineer. In our experi- 
ence, this language is easy for software engineer to adopt. Formal verification, 
the other wing of formal methods, is also known to be difficult for software 
engineers to apply verification because of the complexity of formal proof sys- 
tem. In our project, we use ModelChecker and ModelCertifier in STATEMATE 
MAGNUM(I-LOGIX).In applying formal methods, we found some errors hiding 
behind our RTOS model, which are related with interrupt disable and deadlock. 

In the future, we’ll study abstraction methods to reduce state explosion and 
testing methods to confirm whether model checking results is correct or not. 
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1 Introduction 

The Capability Exchange Signalling (CES) protocol is a control protocol for 
multimedia communications developed by the International Telecommunication 
Union (ITU) [3]. Our goal is to verify this protocol against the set of allowable 
sequences of service primitives (i.e. user observable events), known as the service 
language. Thus the first step to verify the CES protocol is to obtain the CES 
service language. By using automata reduction techniques [2] , our approach is to 
extract the service language from the Occurrence Graph (OG) of the Coloured 
Petri Net (CPN) model [4] for the protocol’s service definition. The OG of a 
CPN model is a directed graph comprising all reachable states (nodes) and state 
changes (edges) of the model. 

Unfortunately the CES protocol operates over unbounded channels giving rise 
to an infinite OG. In previous work [6], we tackled this problem by including 
the channel capacity, I, as a parameter of the CPN model and derived a recur- 
sive formula for the parametric OG (OGi). In this paper by treating OGi as a 
parametric Finite State Automaton (ESA), we derive a recursive formula for the 
ESA with epsilon (i.e. non-primitive events) removed (denoted as FSA^^). This 
is an important step towards obtaining the CES service language for arbitrarily 
large channel capacity. We believe this is the first time that a recursively repre- 
sented parametric non-deterministic ESA has been transformed to its recursively 
represented language equivalent epsilon-removed ESA. 

Infinite state and parametric systems are active research areas [1,4,9]. [9] uses 
inductive theorem proving to verify parametric systems. While [4] exploits the 
equivalence or symmetry of a system to obtain a condensed representation of 
the state space, [1] proposes a formalism, known as simple regular expressions, 
to represent the state space, then for both methods the verification of an in- 
finite state system can be reduced to the verification of a finite state system. 
Our approach tackles the state spaces of parametric systems from a different 
direction. By exploring the regularities existing in the structure of the OG, we 
provide a recursive formula for the OG and its associated automata, in terms 
of the system’s parameter (s). We therefore obtain the OG and languages for a 
class of systems and can use them to prove properties. 

This paper briefly reviews the GES service in Section 2. Section 3 states the 
main theorem for the recursive formula for FSA,,, . Finally, Section 4 summarises 
the results, and indicates future research directions. 
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Fig. 1. The general model of the CES service 



2 A Brief Review of the CES Service 

Fig. 1 shows the general model of the CES service. The outgoing CES user and 
the incoming CES user correspond to the user that initiates a capability exchange 
and the user that responds to the exchange request respectively. The CES service 
is defined by the occurrence of CES service primitives at the interface between 
the CES service user and the CES service provider (comprising the CES protocol 
entities and their underlying medium). Six CES service primitives are defined 
[3]: TRANSFER. request (abbreviated as Treq), TRANSFER. indication (Tind), 
TRANSFER. response (Tres), TRANSFER. confirm (Tcnf), REJECT. request 
(Rreq) and REJECT. indication (Rind). User initiated and protocol initiated 
REJECT. indications are represented as RindU and RindP respectively in Sec- 
tion 3. While the four TRANSFER primitives are used to transfer capabilities, 
the two REJECT primitives are used to reject capabilities of the peer or to 
terminate a capability transfer. 

3 A Recursive Formula for FSA^^ 

To tackle the infinite OG of the CES service CPN model [6], we parameterised 
this CPN model in terms of the channel capacity, 1. We obtained a recursive 
formula for OGi, the parametric OG of the parameterised CPN, by exploiting its 
structural regularity. To obtain the CES service language, we need to eliminate 
from OGi epsilon transitions (i.e. those representing the internal operation of 
the protocol) while preserving the sequences of service primitives. We treat OGi 
as a parametric FSA (denoted as FSAogi) by designating an initial state and 
final states. Then we utilise FSA reduction algorithms [2] to remove epsilons 
from FSAqGi j which results in FSA^, . We can perform these transformations 
for specific (small) values of I by using tools such as FSM [5]. However, we wish 
to obtain a symbolic result for arbitrary values of 1. In this paper, we focus 
on finding a recursive formula for FSA^^. We have also obtained a recursive 
representation for its deterministic form FSAjy^ as summarised in [8]. 

Let FSA^, = (V), , T', , uoe, j Fi) {I G Af+) be the epsilon-removed FSA ob- 

tained from FSAqgi- By applying the epsilon removing method [2] to FSAogh 
we obtain a recursive formula for FSA^^, which is stated as follows. 

Theorem 1 For I G Af"*", 

(i) FSA^^ = {Ve^,T'Aei,vo,^,F^J, where 

= {SI, S2, S3, S4, S5, S6, S7, S8, 59, 510, 512} 



( 1 ) 
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T' = {Treq, Tind, Tres, Tcnf, Rreq, Rind, RindU, RindP} (2) 

Ajj is represented in Table 1, where the source nodes of the arcs are shown in 
the first row (column header), the arc labels are shown in the first column (row 
header). If there exists an arc {v,t,v') € the destination node of this arc v' 
is put into the table cell corresponding to the source node v and the arc label t. 



Vo,, = SI 


(3) 


F,, = {51, 54} 


(4) 


(ii) fori >2, FSA^, = (V;, , T', where 






(5) 
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A,,=A,_,UAff^ 


(7) 



where is shown in Table 2, which follows the same format as that used in 
Table 1, where = S2, = 510, = 54, = 512. 



Table 2. I > 2 




^^0., = 51 
F,, = U 



(8) 

(9) 
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We see that FSA^^ has 11 states (equation (1)), and 35 arcs (Table 1). Once 
this FSA is generated (e.g. by using FSM), we can construct FSA^^ for I > 2 
by following equations (5) to (9), including Table 2, which involves adding to 
FSA^^_^ four states (equation (6)), (12*1+10) arcs (Table 2), and one final state 
(equation (9)), while the initial state of FSA^^ is the same as that for FSA^^_^. 
Readers interested in the proof of Theorem 1 are referred to [7] . 

4 Conclusion 

The contribution of this paper is twofold. Firstly we show that a recursive para- 
metric non-deterministic FSA which includes internal operations can be trans- 
formed to another recursive parametric non-deterministic FSA without epsilons, 
that preserves the CES service language for arbitrary channel capacity 1. To- 
gether with the result presented in [8] for the determinised FSA, this result 
allows us to obtain a recursive parametric and deterministic FSA for the CES 
language. Secondly, and more speculatively, we generalise our methodology for 
determining service languages of finite-state systems to infinite state systems 
that can be parameterised. This is possible when structural regularities of the 
state space exist which allow it to be represented recursively and by proving 
that similar regularities hold with its corresponding FSAs. Our next step will 
be to use the recursive representation of the CES service language to verify the 
CES protocol against its service for arbitrary channel capacity. We also plan 
to determine the general conditions under which recursive representations for 
parametric FSAs can be preserved under FSA transformations. 
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Abstract. We consider the state feedback control of parameterized dis- 
crete event systems consisting of N similar processes. The basic idea 
underlying the proposed approach is to exploit both the symmetry of 
the system to be controlled and the symmetry of the control specifica- 
tion in order to avoid the exploration of the entire state space. Under 
some assumptions, it is shown that it suffices i) to synthesize off-line a 
control policy for a small value of the parameter N, and ii) to infer on- 
line acontrol policy for a larger system, consisting of an arbitrarily large 
number of processes, from its current state and the previously calculated 
control policy. Soundness of the new synthesis method is also established. 



1 Preliminaries and Definitions 

The approach proposed in this paper relies on three main concepts developed 
in the verification domain [1], but here exploited in the context of Supervisory 
Control Theory (SCT) [2]: reduction, parameterization, and symmetry. It is mo- 
tivated by the issue of scalability arising from synthesis algorithms and based 
on a general synthesis method that includes heuristics [3] . 

The concepts and results introduced in this section are part of the work 
originally developed by Ramadge and Wonham and are borrowed from [4] . Let us 
assume that the given discrete event system (DES) is modeled by an automaton 
G := {Q, S, S, Qo, Qm), where Q is a finite set of states; 27 is a finite set of events 
divided into two disjoint subsets Sc and 27„ of controllable and uncontrollable 
events, respectively; S : Q x S ^ Q is the partial transition function; and Qm is 
the subset of marked states, which represents the completed tasks. 

A state feedback control (SFBC) policy for G is a total function/ : Q — >• F, 
where T := {S' C 27 | 27' A 27„}. If cr G /(<?), then a is enabled at q; otherwise, 
it is disabled. For a € S, the predicate/^ on Q is defined by fa{q) cr G f{q)- 
Thus a control policy / may be described by a family of predicates {/^ | ct G 27}. 

* The research described in this paper was supported in part by the Natural Sciences 
and Engineering Research Council of Canada (NSERC) and the Fonds quebecois de 
la recherche sur la nature et les technologies (FQRNT). 
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Let <5(<7, cr)! mean that S{q,a) is defined. The controller, represented by /, 
and the DES, represented by G, are embodied in a closed loop which is defined 
by Gf := {Q, S,S^ ,qo,Qm), where S^q,a) := 6{q,a) if i5(g,CT)! and a G f{q), 
and is undefined otherwise. 

Let Pred(Q) denote the set of all predicates on Q. For a fixed a G U, the 
predicate transformer M„ : Pred(Q) — >■ Pred(Q) is defined by 

Ma{P){q) {S{q,a)l => P{S{q,a))). 

The predicate transformer (•) : Pred((5) — >■ Pred(Q) is defined by 
(P)(g) {Vw\wG E*: S(q,w)! F(S(q,w))). 

Let P G Pred(Q) be a predicate which represents the control specification to 
be fulfilled. The reachability predicate R{G, P) holds on those states that can 
be reached in G from q^ via states satisfying P (see [4] for a formal definition). 
The fundamental property of controllability must be introduced to deal with any 
control problem formulated in the framework of SCT. A predicate P G Pred((3) 
is controllable with respect to G if P R{G, P) and (Vct \ a G Su '■ P ^ 
Mrj{P)). Since P yf false is controllable if and only if there exists a SPEC 
policy / such that i?(G^, true) = P, the SPEC policy / is given by (Vct | ct G 
Ec ■ fa ■= M^{P)). If P fails to be controllable, following the conventional 
procedure, supCP(P) = R{G, (P)) is then targeted. The corresponding optimal 
(behaviorally least restrictive) SPEC policy /* is then synthesized. For all ct G 
ff{q) :<t4> S{q,ay. A {P){5{q,a)). 

Let us consider a parameterized DES G^ , where is a parameter that de- 
notes the number of processes, defined from the finite composition of a replicated 
structure Pi := {Qi, E U Aj, 5i, Qm,i), where Qi is a finite set of indexed states; 
A is a finite set of non-indexed, controllable events; Ei is a finite set of indexed 
events, Ei = E^^i U E^y, 6i : Qi x (E U Ei) -G Qi is the partial transition func- 
tion; and Qm,i is the subset of marked states. The replicated structure represents 
the behavior of similar processes. The parameter N can be substituted by any 
number n G N. The events that belong to E are shared by all processes and 
allow synchronization. The concept of replicated structure is translated into a 
process similarity assumption (PSA), where 9 := (j/i), for 1 < i,j < N, is a, 
substitution such that i-9 = j (see [5] for a full definition). 

Assumption 1.1 Process Similarity Assumption (PSA): 

(Vz,j \l<fj<N: Pj = P,.9). 

Therefore, a process can be derived from any other process by index substitution. 
A global state s G Q^ is represented by a tuple of N local states. Let s[z] 
denote the i-th component of s. The transition structure G^ is defined from 
a synchronous composition for events in E and an interleaving composition for 
events in each Ei. Thus, G^ := {Q^ , E^ , 5^ , Q)(), where E^ = E\JEi\J- ■ •UAjv 
and (<5-^(s, ct))[z] = Ji(s[z],CT) if ct G EUEi and (5'^(s, ct))[z] = s[z] otherwise. An 
instance of a parameterized DES is denoted by (G”,so), where sq G Q” is the 
initial state. 
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Definition 1.2. Let no,n G N, where no < n, := {J \ : J = 

{ji. • • ■ , J«o} ^ 1 < < J 2 < • • • < jno < n)}. 

Definition 1.3. Let J £ (in the sequel, the expression “Let J £ Jn^” 
means “Let J = {ji, ■ ■ ■ , jno} G jJno”)- projection operator fj on a global 
state s G Q" is a function f j : Q" — >■ Qji x • • • x that is defined as 

tj s := 

Definition 1.4. Let J £ Jf(^. The substitution operator Oj on a global state 
s G Qji X • • • X Qj„g is a function 9j : Qj-^ x • • • x Qj„^ — >■ that expresses 
the simultaneous replacement of process indices by process indices 

l,...,no, respectively. Lt is defined as 9js := , s[no].(„o/j„^)). 

Definition 1.5. Let J £ The projection operator fj on an event a G 
is a function f j : -S'” — >■ U U • • • U U {e} that is defined as follows: 
fj a := a, if a £ S or a £ Si and i £ J ; and f j a \= e, if a £ Si and i ^ J. 

Definition 1.6. Let J G The substitution operator 9j on an event a £ 

S U Sjj^ U • • • U Sj^^ U {e} is a function 9j : SU Sj.^ U • • • U Sj^^ U {e} — >■ U {e} 

that is defined as 9ja := a, if a £ S; 9ja := cr,(^k/j,,) if o' £ Sj,. and jk £ J; and 
9j€ := e. 

2 The Synthesis Method 

A SFBC policy is synthesized from a particular instance of , say G", and a 
control specification. The latter can be given in two ways: i) by a parameterized 
predicate £ Pred(Q^) or ii) by a predicate G Pred(Q"“) with no < n. In 
both cases, and P" must satisfy the following similarity assumption (where 
Gj := 9jofj). 

Assumption 2.1 Specification Similarity Assumption (SSA) — Let s G Q". 
P"(s) (VJ I J G 

The synthesis method includes two parts: an off-line synthesis and an on-line 
synthesis. The off-line synthesis consists in calculating a SFBC policy f(}°, with 
respect to (G”“,so) and P”“, such that P((G”“)'^^° , true) = supCP(P"“). This 
can be done by using an appropriate synthesis algorithm. 

Since Q" grows exponentially with respect to n, it is unrealistic to compute 
fd with respect to (G", So) and P" off-line for an arbitrarily large value of n. An 
on-line synthesis approach is adopted. The computation of fd from fd°, where 
no S n, is done in the following way: 

fdis):=S^- U 9f\S^<^-fd°{Ojs)). 

Note that — fdf{Gjs) is a set of disabled events. 
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The worst case computational complexity for f^° is still exponential with 
respect to no, but as no is usually small this step becomes tractable. The 
computation of /JJ(-) is tractable with worst case computational complexity in 
0((n - no + 

3 Soundness of the Synthesis Method 

The following proposition establishes that the predicate transformer (•), intro- 
duced in Section 1, preserves Assumption 2.1. 

Proposition 3.1. Let s G Q”. Then (P”)(s) (V J | J G : (P”«)(6>js)). 

The following proposition relates a SFBC policy achieving supCP(P") to a 
SFBC policy achieving supCP(P"“). Since a transformation of an event could 
return e, we use the convention that /e(s) holds. 

Proposition 3.2. Let s G Q" and a G A". Then 

ff{s) ^{\/J\JG JZ : f^U^js)). 

Finally, the following theorem constitutes the main result of this paper. It 
establishes the soundness of the synthesis method. 

Theorem 3.3. Let s G Q". Then 

(VJ I J G JZ : = {a I a G V (0js)}) ^ 

nis) = {a\a€S:vff{s)}. 

In addition to the soundness property, it should be noted that the SFBC 
policy generated by the on-line synthesis algorithm is robust, because our method 
can be easily adapted to react dynamically to some perturbations (addition 
or deletion of a process) occurring in the system by taking into account the 
number of processes that are alive. Finally, we are presently investigating how 
to encompass the nonblocking and partial observation aspects in our framework. 
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Abstract. Box-pushing games are a challenging problem for both man and 
machine since it is not easy to find out a minimal solution for the games. This 
paper describes a formal framework for solving the games via symbolic model 
checking techniques. Since our method is automatic and sound, it gives a 
minimal solution if model checking succeeds. However, this framework is not 
complete so that it fails to find an answer in case the state explosion problem 
occurs. Push-Push chosen as a case game consists of 50 games. 43 games are 
solved with NuSMV but 7 failed due to the state explosion problem. Thus we 
devise several optimization techniques for the games to mitigate the state 
explosion problem such as abstraction, relay model checking, and efficient 
counterexample generations. As a result, we solve all games with minimal 
solution. 



1 Introduction 

This paper presents our formal framework to solve one-player computer games using 
symbolic model checking techniques [1]. In particular, we are interested in solving 
box-pushing games such as Push-Push, SokoMindPLUS and XSokoban with the 
model checker NuSMV [2]. Since box-pushing games are a challenging problem in 
artificial intelligence to find a minimal solution for the games, many search 
algorithms have been extensively studied [3,4]. But they are hard-wired; that is, 
separately designed for each game. Instead of separately developing the search 
algorithms, we describe a general framework to find a minimal solution for the games 
via model checking. Since our method is automatic and sound, it gives a minimal 
solution if model checking succeeds. However, our approach is not complete so that 
it fails to find an answer in case the state explosion problem occurs. 

Push-Push is chosen as a case game and consists of 50 games. 43 games are solved 
with NuSMV but 7 failed due to the state explosion problem. Thus we devise 
optimization techniques for the games to mitigate the state explosion problem such as 
abstraction, relay model checking, and efficient counterexample generations. As a 
result, we solve all games with minimal solution. 
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2 The Game and Its Solving Framework 

Box-Pushing Game. The playing area consists of squares, laid out on a rectangular 
grid. Boxes and goals are given throughout the playing area. There is an agent whose 
job it is to move each box to a goal square. The agent can only push one single box at 
a time and must push from behind the box. A square can only be occupied either a 
box or agent at any time. Pushing all the given boxes to the goal squares can be quite 
challenging. Doing this in the minimum number of moves which is called a minimal 
solution is much more difficult for both man and machine. In this sense, box-pushing 
games can be regarded as motion planning problem in robots [5]. 

Formulation. Box-pushing games can be represented as finite state automaton G = 
<Q, A, S, qi, qt>, where Q is & set of states, A = {left, right, up, down} is the set of 
move actions, & Qx A Q is the partial state transition function, qi and q, is the 
initial and goal state of Q. For convenience, we use q and q’ to denote a current state 
and a next state of Q, and aeA to denote an action. By definition, S{q,a)=q’ holds iff 
when executing the action a in the current state q the next state q’ is a possible 
outcome. We say that an action a is applicable in q iff there is a state q" such that 
S{q,a)=q\ An image of a in q, written Img(q,a), is q’ such that S{q,a)=q’ . 

Traces are finite sequence of actions that is an element of A*. We use £ for the 0- 
length trace, a and fi for generic traces, oAfi for trace concatenation. The notion of 
applicability and image easily generalize to traces. A trace cte A* is applicable in q iff 
one of the following holds: 

a= eis applicable in any state q&Q , or 

a= <a>''j5 and a is applicable in q and P is applicable in Img{q,a). 

Let a= <a>''P. The image of ain q, written Img(a, q), is defined as Img{e, q) = q and 
Img{<a>'^P, q) = ImgiP, Img(a,q)). Our goal is to find out a trace that is a sequence of 
actions and served as a solution for the game. Since many traces are possible in a 
game, a minimal one must be computed as a solution. The sequence or is a solution 
trace for the game G iff oris applicable in < 7 , and Img(a, qi)=qt- In addition, ais called 
a minimal solution trace if |o| < \P, where P denotes any possible trace. 



Framework. Given G = <Q, A, S, < 7 ,, qt>, a model M = <S, 1, R, L> is constructed, 
where S = {(q, a) \ qs Q, aeA], I = {(qi, a) \ aeA], R(s,s’) holds iff there is an action 
a’ such that s=(q,a) a s’=(q’,a’) a S{q,a)=q\ and L{q)={q \ s={q,a)}. Given the model 
M, a minimal solution path !tfor the game G is obtained with model checking 

is ifM\=AG-^q, 



MC{M , AG-nqP = 



K 



if M'f AG-^q, 



where AG^q, means that there is no possibility to reach the goal state q, from initial 
states. Model checking generates a counterexample k= if the formula is 

false. Otherwise, it gives e that means there is no solution at all. Since NuSMV 
traverses the search space in BFS fashion, it gives the shortest counterexample. 
Theorem 1. If MC{M, AG— = n, then .;ris the minimal solution path. 

Theorem 2. Let MC{M, AG^q^ = .;r. If .;r is a solution path, then there is a 
corresponding solution trace a in G. 
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Solving games. Theorem 1 says that our approach is adequate. And theorem 2 shows 
that it is sound; i.e., whenever the counterexample n= MC{M, AG^qi) is generated, it 
is a minimal solution for the games. Push-Push is chosen as a case game and consists 
of 50 games. In this way, 43 games are solved with NuSMV but 7 failed due to the 
state explosion problem [6]. To mitigate the state explosion problem, the next section 
shows several optimization techniques. 



3 Optimizations 

Abstraction. In this game, we found there are deadlock positions at which balls never 
be moved. These positions can be removed with respect to the minimal solution, so 
that the total state space can be significantly reduced. Let M’ be the reduced model 
and under approximation of the original model. It is easy to see that there is a 
preorder relation M’<M between the reduced model and the original one [7]. Then 
M'\i= ^ M \t^ (j) holds; that is, if an ACTL formula (j)K failed in the reduced model, 
then it failed in the original one. So we can do model checking MC{M’,<j)) instead of 
MCiM,(jf). As a result, 6 games out of 7 unsolved are solved (For more details, see 
[ 8 ]). 

Relay model checking. Although almost all games are solved with abstraction, the 
last 50* game is not solved due to lots of iterations. To avoid the problem, we divide 
the whole formula (j) into several subformulas and then performs model checking one 
by one. For this reason, it is called relay model checking. In particular, it is designed 
to generate a counterexample when the formula AG— i((^iA...A(4i) is false. In case the 
exact model checking M|^ AG— i((^iA...A(^„) is failed, it can be considered as an 
alternative method. With this technique, the last game is solved with 338 steps. 
Unfortunately, it is not a minimal solution. (For more details, see [9]). 

Efficient counterexample generation. Although relay model checking gives an 
answer for the last games, it does not guarantee a minimal solution. Thus we modify 
the model checker NuSMV in three directions to generate counterexample efficiently. 

(1) Reducing the number of search space: Generating counterexample takes time and 
space since it traverses the search space twice. But we modify NuSMV to traverse the 
search space once when a CTL formula to be checked is false. Thus the number of 
traversing the search space is reduced from twice to once. Our experimental results 
show that the modified NuSMV shows better performance than original one; i.e., on 
average, we obtain 65% time improvement and 17% space improvement in solving 
the games. 

(2) Storing BDDs on secondary memory: Solving games requires high memory 
requirements. It is not sufficient for 1GB to store intermediate computation results in 
the form of BDDs. To deal with higher memory requirements, we modify NuSMV to 
store intermediate BDDs on hard disk. On the other hand, NuSMV stores intermediate 
results only upon main memory. Our experimental results show that the modified 
NuSMV shows better performance than original one; i.e., on average, we obtain 47% 
space improvement in solving the games. While the 50* game is not solved with 
original NuSMV, it is solved in 174 minutes with the modified NuSMV. Since model 
checking is done in a time, the solution is an optimal. 
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(3) Other traversals: NuSMV traverses the search space only in the backward 
direction. Since the games we consider here have lots of different configurations, we 
cannot say that the backward traversal is not always good for the games. For this 
reason, we modify NuSMV in order to support various traversals; i.e., forward 
traversal and hybrid traversals. 

(4) Putting all together: We have experiments on solving Push-Push with putting 
optimization techniques all together. As a result, we obtain on average 83% time 
improvement and 56% space improvement in solving the games compared to the 
original NuSMV (For more details, see [10]). 



4 Conclusions 

Model checking exhaustively explores all search space generated by the game to find 
a minimal sequence of moves. In Push-Push, the search space often is huge. Although 
model checking finds out an optimal solution, it always suffers from the state 
explosion problem. To overcome this well-known problem, we use three optimization 
techniques. As a result, we solve all 50 Push-Push games with a minimal sequence of 
moves. Surprisingly, we obtain on average 83% time improvement and 56% space 
one in solving Push-Push compared to the original NuSMV. 

In summary, we know that optimizations are absolutely needed for successful game 
solving. Thus we proposed several optimizations for solving box-pushing games. We 
hope that our experiences are valuable for a lecturer to teach formal methods. 
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Abstract. We present a novel approach to static check properties for 
RT-Level design verification. Our approach combines program-slicing 
based static design extraction, word-level SAT solving and dynamic 
searching techniques. The design extraction makes property-checking 
concentrate on the design parts related to the given properties, thus 
large practical designs can be handled. Constraint Logic Programming 
(CLP) naturally models mixed bit-level and word-level constraints, and 
word-level SAT technique effectively solves the mixed constraints in a 
unified framework, which greatly improves the performance of property 
checking. Initial searching states derived from dynamic simulation dra- 
matically accelerate the searching process of property checking. A proto- 
type system has been built, and the experimental results on some public 
benchmark and industrial circuits demonstrate the efficiency of our ap- 
proach and its applicability to large practical designs. 



1 Introduction 

With increasing complexity of modern VLSI design, verification has become the 
bottleneck of the design process. Model checking [1] is a technique by analyzing 
a design for the validity of properties stated in temporal logic. The main problem 
in model checking is state space explosion. Biere et al. [2] proposed a bounded 
model checking procedure using SAT algorithms. However, the practical RT- 
Level descriptions often have word-level operators. Collapsing those word-level 
operators into a single CNF formula destroys the regularity of the problem and 
often makes the problem much harder to solve. There are some works on using 
ATPG techniques for static property checking like bounded model checking [3] . 
However, the ATPG-based method will only work in gate level for property 
checking. For the given HDL description, they need a synthesis step to convert 
the RT-Level or behavior level description into gate-level netlist. Therefore, they 
cannot efficiently handle large designs. 

In this paper we propose a novel static properties checking method using 
unified word-level SAT techniques called constraint logic programming (CLP) 
[4] .There are several advantages of our method: (1) design extraction can dramat- 
ically reduce the design complexity and make the followed process concentrate 
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Verilog Code 




Fig. 1. Property checking framework 



on the property related design, so large practical designs can be efficiently han- 
dled. (2) the CLP constraints can model and solve the bit-level and word-level 
constraints in a unified framework, so higher performance could be got than 
those only deal with the designs at gate-level and hybrid method. Experimental 
results show that our method can get promising speed up and efficient memory 
usage compared with SAT based methods. (3) our method can directly handle 
the RT-Level or behavior level HDL design descriptions without synthesizing 
them into gate- level netlists. (4) CLP combines the expressiveness of logic pro- 
gramming and the constraints solving techniques, our method bridges the gap 
between EDA researches and the research progress in constraint satisfy problem 
and artificial intelligence area. 



2 The Property-Checking Framework 

The property checking framework is shown in Fig. 1. The input of our framework 
is RTL or behavior level Verilog codes with embedded properties or properties 
defined in separate files. We define our properties similar to assertions defined in 
System Verilog [5], and only support a subset of SystemVerilog. The assertions 
are used as Verilog comment with ” // A:” as the prefix. Reader can refer to [5] for 
detailed assertion definition. Our framework compiles the inputs to extract the 
embedded properties. For each property, the framework performs a negation form 
generation operation by inverting the conditional expressions used in property. 
Then, a circuit extractor [6] extracts the Verilog codes related to the property 
from the whole design description, and only the extracted codes are used for 
static properties checking. After transforming the extracted Verilog codes to 
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CLP constraints, GNU Prolog [4] is used to solve the constraint equations to 
generate test data that violate the given property. 

There are two possible results: 

— The set of constraints cannot be satisfied. In this case, if the pre-set bound 
limit or time or memory limit is not exceeded, the constraints generation 
and solving processes continue with one more time frame expanded. 

— The set of constraints can be satisfied. In this case, the GNU Prolog solver 
can compute one possible solution or all the solutions for that set of con- 
straints. We then can conclude that there are errors in the design. By map- 
ping the values of the solutions over the inputs ports along the time frames 
we can build the counterexample for detecting the error the assertion moni- 
tored. 

All of the operations included in the framework can be done automatically. 

3 Implementation and Experiments 

We have implemented a prototype framework that adopts the proposed static 
properties-checking techniques. The framework is applied to several public 
benchmarks and some practical circuits. Table 1 shows the characteristics of 
these designs. 



Table 1. Circuit statistics 



Name 


Code Lines 


Pis 


Assertions to be Checked 


MPEG Decoder 


2035 


8 


1 


Decoder 


2092 


14 


1 


PPC60X BIP 


3583 


23 


2 


Stack Manage Unit 


1467 


20 


1 



The MPEG Decoder has one property (PI): when the signal start is set 
to ”0”, all the states will be set to initial state. The Decoder circuit has one 
property (P2): it can properly deal with multiple interrupt signals according to 
the architecture manual. PPG60x BIP has two properties (P3 and P4): always 
when the BG signals of the two GPU rise to ”1”, they will go back to ”0” in 
some future. SMU has one property (P5): a fill signal will be generated when 
there are more valid entries than the low watermark in the stack cache. After 
applying design extraction on all designs, the codes lines of the extracted design 
descriptions that related to the assertions are no more than 300 lines. 

We compiled the Verilog description to BLIP files using VIS, and then con- 
verted the constraints from BLIP to GNF format to be solved by Ghaff SAT [7] 
solver, and compared the results with our method. Table 2 shows the experi- 
mental results for static properties checking conducted on these designs using a 
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Table 2. Circuit statistics 



CLP SAT 

Property Bound CPU Time (Sec) Memory Clauses Literals CPU Time Memory 



CLP Gen Solving (MB) (Sec) (MB) 



PI 


30 


8.74 


4.71 


5.39 


66334 


159331 


51.75 


31.7 


P2 


4 


0.50 


0.01 


1.01 


1921 


3549 


11.01 


4.3 


P3 


20 


3.62 


1.54 


2.08 


17407 


39623 


16.39 


10.7 


P4 


20 


3.58 


1.39 


2.11 


17559 


39771 


16.57 


11 


P5 


60 


10.78 


4.11 


3.81 


98397 


234704 


66.32 


99.7 



Windows 2000 PC with AMD 1.8GHz CPU and 256MB memory. The Bound 
columns are the maximum time frames expanded to check the assertion. The 
CPU Time column is the time for assertion checking. The Memory column in- 
dicates the memory usage for constraint solving which are in Mega bytes. The 
Clauses and Literals column shows the number of clauses and the number of 
literals required by SAT, respectively. Conclusion could be got from Table 2 
that word-level SAT solver can obtain good performance and efficient memory 
usage when compared with SAT-based method. 
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Abstract. Many circuit designs need to follow some temporal rules. 
However, it is hard to express and verify them in the past. Therefore, 
a temporal assertion extension to Verilog, called Temporal Wizard, is 
proposed in this paper. It provides several Verilog system tasks for the 
users to write assertions in testbench directly. Two new concepts, tag 
and thread, are also introduced so that data can be associated with 
temporal assertions and provide more functionalities than previous 
temporal assertion checkers. 

Keywords: temporal assertion, PSL, verification 



1 Introduction 

Many circuit designs exhibit temporal behaviors. In the past, there was no good 
solution to express rules for these temporal behaviors and it was difficult verify 
them. As the complexity of circuits increases, it becomes more and more impor- 
tant to find a way to express and verify these temporal assertions [1] . There are 
several vendors providing various solutions, such as Open Vera [2] from Synopsis 
and E from Verisity [3]. Recently, the IBM Sugar has been adopted as a for- 
mal specification language called Property Specification Language (PSL) [4]. All 
these languages provide similar functionalities and can be used to describe tem- 
poral assertions for simulation based verification. However, designs are usually 
written in hardware description languages like Verilog, and a simulator is used 
to execute testbenches to verify the design. In order to verify assertions written 
in another language, an interpreter is often used to interpret these assertions and 
communicate with the simulator. This adds extra overhead to simulation. There- 
fore, several vendors start to support these languages directly in their simulators. 
For example, Synopsys supports OpenVera in VCS, and Cadence supports PSL 
in NC- Verilog. This approach significantly reduces the overhead. However, it 
also makes designs not portable to simulators from other vendors. For instance, 
a design using in-line PSL cannot be simulated by VCS, and it makes Intellec- 
tual Property (IP) or Verification Intellectual Property (VIP) sharing difficult. 
In order to solve this problem. System Verilog [5] has been adopted by many 
simulator venders. But it may take a few years before the standard is finalized 
and be supported by all simulators. 

Besides, using another language to express temporal assertions in a design 
is often over killing. Temporal assertions used in testbenches and designs are 
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often very simple, and it is not worthy to learn a new language just to express 
some simple temporal assertions. It will be much more user-friendly if the user 
can express these assertions in Verilog directly. The Open Verification Library 
(OVL) [6] took this approach. However, it is not an expressive property language 
and thus limits its application. 

Therefore, a new way to express temporal assertions in native Verilog, called 
’’Temporal Wizard,” is proposed in this paper. It provides a set of Verilog User- 
defined System Tasks/Functions (USTF) and is compatible with all Verilog sim- 
ulators which support PLI. Two new concepts, thread and tag, are introduced 
into temporal assertions. Compared with some state-machine based temporal 
assertion checkers [7, 8] , thread and tag enhance the power of temporal assertion 
checkers significantly. 

2 Thread, Tag, and Temporal Wizard Usage 

2.1 Terminology 

Sequence: A textual representation of temporal assertions. Each Sequence corre- 
sponds to a Verilog USTF. It is the basic element in Temporal Wizard. 

Thread: A partially checked Sequence. It has its own status and represents a 
temporal event stream. 

Tag: Data that is associated to a variable and carried by a Sequence thread. 
Trigger: The start of a Sequence checking. 



2.2 Thread and Tag 

Thread. Once a Sequence checker is triggered, there may be several streams 
of events being checked at the same time. For example, if events ”a b c d e” 
are expected to occur in sequence, and the events occur in the order ”a b c d a 
b,” then there will be two possible event streams that satisfy the rule, each has 
its own state. One has events ”a b c d” checked, and the other has events ”a 
b” checked. Since there may be multiple streams of the same assertion flowing 
concurrently, there should be a way to represent these streams so that they 
can be handled. In Temporal Wizard, these streams are represented by threads. 
Every thread has its unique ID, and the thread can be manipulated by it. 



Tag. Since there may be several threads spawned from the same Sequence, it 
will be useful if some data can be carried by the thread and be reused later. For 
example, if we want to express the following temporal rule, we have to pick up 
some value and carry it with the temporal flow. 

Event2 should occur after eventl, and variable V should have the same value 
when either event occurs. 

In this case, we should pick up the value of V when eventl occurs and compare 
it with variable V when event2 occurs. This is why tag is necessary. A tag is 
a data handler in Sequence that can be used to attach data to a thread. It is 
always associated with a variable. It can be used to save data to a thread or 
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// Sequence Definition Block 

handlel= $tb_seq_range(clock, $tb_range(l, 2), eveniJ); 
handle2= $tb_seq(clock, evenl2, handlel); 

//Sequence Trigger Block 
Slb_seq_lrigger(hundle2, result _variahle): 

// Result Handling Block 
always @(result_variable[ i}) 

$display( "Assertion failure ”); 
always @ ( result _va riable[ 0 ] ) 

Sdisplayl “Assertion Success "j; 



Fig. 1. Sequence Usage 



load data from a thread. It can also be used to qualify an event. Tag enhances 
the power of temporal assertion checkers because it enables the qualification of 
an event with the value of another variable. 

Thread and tag also provide the bridge for Temporal Wizard to interact 
with auxiliary Verilog by reading and setting variables in the testbench or the 
design. With these augmented code, some complicated assertions that cannot be 
expressed by Temporal Wizard alone can be written and checked. 

The concept of thread and tag exists implicitly in many temporal assertion 
languages and applications. For example, the ”assert_data_used” checker in Sys- 
tem Verilog [9] and the ’’auxiliary variable” proposed by Ziv [10] are all based 
on it. By making this concept explicit in Temporal Wizard, the power of thread 
and tag can be utilized more fully and efficiently. 



2.3 Temporal Wizard Syntax and Usage 

A complete Temporal Wizard usage has three blocks: Assertion definition. Se- 
quence trigger, and result handling. A result variable is given to the trigger and 
assertion success or failure will be returned there. A typical usage of Temporal 
Wizard is given in Figure 1. All the system tasks provided in Temporal Wiz- 
ard are categorized in the three groups. The first one is for temporal properties 
which includes basic, timing constraint, order constraint, flow control and logic 
expression tasks. The second group is for thread manipulating, and the third 
group is for tag. 



3 Implementation of Assertion Checker 

3.1 Data Structure 

Temporal Wizard has two main data structures: Sequence and Thread. Sequence 
represents the assertion that the user writes, and thread is created dynamically 
during run-time. The data saved in a tag is also carried by thread. 
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Thread links to a clock and an event and is evaluated when the clock changes 
or when the event occurs. It has a Program Counter (PC) which points to the 
event currently being monitored. 



3.2 Algorithms 

A Sequence thread has two states: Active and inactive. In active state, the thread 
is waiting to be evaluated. In inactive state, the thread is not waiting for any 
event and is waiting for its child thread to resume itself. It has four stages in its 
life cycle: Setup, evaluate, execute and finalize. The state and stage transition 
diagram of thread is given in Figure 2, and the four stages in the transition 
diagram are described below: 

1. Setup: The thread is created in setup stage. Its PC to the event and clock 
being monitored are also setup here. 

2. Evaluate: The thread waits for the event being monitored to occur in this 
stage. When an event occurs, it evaluates the event according to the Sequence 
type and determines whether it is a success event or a fail event. Then it goes 
to execute stage. 

3. Execute: This stage can be reached either from the evaluation stage or 
from a thread resumed. A success/fail flag will be passed from either source. 
According to the next event that PC points to and the success/fail flag, different 
actions will be taken in this stage: 

If fail flag is set, go to flnalize stage. 

If success flag is set, there are three possibilities: 

a. If next PC points to an event, advance PC to that event and go to evalu- 
ation stage. 

b. If next PC points to a Sequence, setup a new thread for that Sequence, 
and go to inactive state. 

c. If next PC points to null, the Sequence checking is finished, go to flnalize 
stage and pass success to it. 

4. Finalize: Some clean-up is done and the thread is destroyed in this stage. 
If the thread is created by another thread, it will resume its parent thread. 
Otherwise it will update the result variable given in Sthseq-trigger. 

Take the following Sequence for example, assume there is no violation. 
Threadl is the thread for hi, and thread2 is the thread for h2. 
hl=$tb-seq-range($tb-posedge(clk), $tb-range(2, 4), ^4); 
h2= $tb-seq($tb-posedge(clk), el, e2, hi, eS); 

1. Thread2 is created. It monitors elk. 

2. Posedge{clk) occurred and el is 1, thread2 is evaluated as success, executed, 
and its PC advances to e2. 

3. Posedge{clk) occurred and e2 is 1, thread2 is evaluated as success, executed, 
and its PC advances to hi. Since hi is a Sequence, threadl is setup, and 
thread2 goes to inactive state. Threadl monitors elk and e4. 

4. Assume e4 is 1 between 2-4 posedge{clk) , threadl will be evaluated as success 
and will be executed. Since its next PC points to null, thread2 will be finalized 
with success flag set, and thread2 is resumed and executed. Thread2's PC will 
then be advanced to e4. 
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Fig. 2. Thread State and Stage Transition Diagram 



5. Posedge{dk) occurred and e4 is 1, thread2 is evaluated as success and is 
executed. Since its next PC is null, thread2 is finalized, and the result variable 
is set to success. 



4 Conclusion 

A temporal assertion extension to Verilog, called Temporal Wizard, is proposed 
in this paper, and it has been used successfully by Avery Design System to 
design their verification IPs. Compared with other temporal assertion languages. 
Temporal Wizard is more portable and much easier to use because the user does 
not need to learn a new language. Two innovative concepts, tag and thread, are 
also introduced in this paper. These two features enrich the power of temporal 
assertion checkers and enable the user to write more complex temporal assertions 
than before. 
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